Aspect-based Query Expansion for Search Results Diversification

Query expansion is a prominent method to reformulate the under-specified and ambiguous query that helps to retrieve documents. This paper is aimed at diversifying web documents using a novel query expansion method. Our method leverage query suggestions and completions for pooling the terms which are used for final expansion. We then group similar candidates into the clusters. In this regards, a soft clattering algorithm is applied to the candidates, where each cluster represents different query aspect. The query is then expanded using the terms selected from cluster labels. Multiple lexical and semantic features are introduced to compute the relevancy between a query and the document. The lexical features use the textual contents of the query and document while, the semantic ones leverage the word-level semantics in word2vec model. Finally, a weighted linear ranking approach is applied to rank the documents retrieved for the query using the extracted features. The experimental results on Clueweb09 document collection using TREC 2012 Web Track queries clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.

[1]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[2]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[3]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[4]  W. Bruce Croft,et al.  Diversifying query suggestions based on query documents , 2014, SIGIR.

[5]  James P. Callan,et al.  Query Expansion with Freebase , 2015, ICTIR.

[6]  Hsin-Hsi Chen,et al.  Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison , 2006, AIRS.

[7]  Se-Jong Kim,et al.  Subtopic Mining Based on Three-Level Hierarchical Search Intentions , 2016, ECIR.

[8]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[9]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[10]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[11]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[12]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[13]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[14]  Dawid Weiss,et al.  Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition , 2004, Intelligent Information Systems.

[15]  Masaki Aono,et al.  Query subtopic diversification based on cluster ranking and semantic features , 2016, 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA).

[16]  Donald Metzler,et al.  Beyond bags of words: effectively modeling dependence and features in information retrieval , 2008, SIGF.

[17]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[18]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[19]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[20]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[21]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[22]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[23]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.