Supervised Search Result Diversification via Subtopic Attention

Search result diversification aims to retrieve diverse results to satisfy as many different information needs as possible. Supervised methods have been proposed recently to learn ranking functions and they have been shown to produce superior results to unsupervised methods. However, these methods use implicit approaches based on the principle of Maximal Marginal Relevance (MMR). In this paper, we propose a learning framework for explicit result diversification where subtopics are explicitly modeled. Based on the information contained in the sequence of selected documents, we use the attention mechanism to capture the subtopics to be focused on while selecting the next document, which naturally fits our task of document selection for diversification. As a preliminary attempt, we employ recurrent neural networks and max pooling to instantiate the framework. We use both distributed representations and traditional relevance features to model documents in the implementation. The framework is flexible to model query intent in either a flat list or a hierarchy. Experimental results show that the proposed method significantly outperforms all the existing search result diversification approaches.

[1]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[2]  Enrique Alfonseca,et al.  Generalized syntactic and semantic models of query reformulation , 2010, SIGIR.

[3]  Farzin Maghoul,et al.  Query clustering using click-through graph , 2009, WWW '09.

[4]  Ismail Sengör Altingövde,et al.  Scalable and Efficient Web Search Result Diversification , 2016, ACM Trans. Web.

[5]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[6]  Tetsuya Sakai,et al.  Evaluating Search Result Diversity using Intent Hierarchies , 2016, SIGIR.

[7]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[8]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[9]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[10]  Fuji Ren,et al.  Search Result Diversification via Filling Up Multiple Knapsacks , 2014, CIKM.

[11]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[12]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[15]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[16]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[17]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[18]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[19]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[20]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[21]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[22]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[23]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[26]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[29]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[30]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[31]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[32]  Tetsuya Sakai,et al.  Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.

[33]  Xueqi Cheng,et al.  Modeling Document Novelty with Neural Tensor Network for Search Result Diversification , 2016, SIGIR.

[34]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Jun Wang,et al.  Top-k Retrieval Using Facility Location Analysis , 2012, ECIR.

[37]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[38]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[39]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[40]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[41]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[42]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..