Learning to Diversify Search Results via Subtopic Attention

Search result diversification aims to retrieve diverse results to satisfy as many different information needs as possible. Supervised methods have been proposed recently to learn ranking functions and they have been shown to produce superior results to unsupervised methods. However, these methods use implicit approaches based on the principle of Maximal Marginal Relevance (MMR). In this paper, we propose a learning framework for explicit result diversification where subtopics are explicitly modeled. Based on the information contained in the sequence of selected documents, we use attention mechanism to capture the subtopics to be focused on while selecting the next document, which naturally fits our task of document selection for diversification. The framework is implemented using recurrent neural networks and max-pooling which combine distributed representations and traditional relevance features. Our experiments show that the proposed method significantly outperforms all the existing methods.

[1]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[6]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[7]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[8]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[9]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[10]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[11]  Farzin Maghoul,et al.  Query clustering using click-through graph , 2009, WWW '09.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[14]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[15]  Enrique Alfonseca,et al.  Generalized syntactic and semantic models of query reformulation , 2010, SIGIR.

[16]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[17]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[18]  Craig MacDonald,et al.  On the role of novelty for search result diversification , 2011, Information Retrieval.

[19]  Yiqun Liu,et al.  Overview of the NTCIR-9 INTENT Task , 2011, NTCIR.

[20]  Tetsuya Sakai,et al.  Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.

[21]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[22]  Yiqun Liu,et al.  Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification , 2013, SIGIR.

[23]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[24]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[27]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[28]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[29]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[30]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[31]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[32]  Fuji Ren,et al.  Search Result Diversification via Filling Up Multiple Knapsacks , 2014, CIKM.

[33]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[34]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[35]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[36]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[37]  Craig MacDonald,et al.  Search Result Diversification , 2015, Found. Trends Inf. Retr..

[38]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[39]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Xueqi Cheng,et al.  Modeling Document Novelty with Neural Tensor Network for Search Result Diversification , 2016, SIGIR.

[42]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.