Learning for search result diversification

Search result diversification has gained attention as a way to tackle the ambiguous or multi-faceted information needs of users. Most existing methods on this problem utilize a heuristic predefined ranking function, where limited features can be incorporated and extensive tuning is required for different settings. In this paper, we address search result diversification as a learning problem, and introduce a novel relational learning-to-rank approach to formulate the task. However, the definitions of ranking function and loss function for the diversification problem are challenging. In our work, we firstly show that diverse ranking is in general a sequential selection process from both empirical and theoretical aspects. On this basis, we define ranking function as the combination of relevance score and diversity score between the current document and those previously selected, and loss function as the likelihood loss of ground truth based on Plackett-Luce model, which can naturally model the sequential generation of a diverse ranking list. Stochastic gradient descent is then employed to conduct the unconstrained optimization, and the prediction of a diverse ranking list is provided by a sequential selection process based on the learned ranking function. The experimental results on the public TREC datasets demonstrate the effectiveness and robustness of our approach.

[1]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[2]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[3]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[4]  Arjen P. de Vries,et al.  Combining implicit and explicit topic representations for result diversification , 2012, SIGIR '12.

[5]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[6]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[7]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[8]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[9]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[10]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[11]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[14]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[15]  Xueqi Cheng,et al.  Exploring and Exploiting Proximity Statistic for Information Retrieval Model , 2012, AIRS.

[16]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[17]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[18]  Thorsten Joachims,et al.  Dynamic ranked retrieval , 2011, WSDM '11.

[19]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[20]  Maoqiang Xie,et al.  Modeling Parameter Interactions in Ranking SVM , 2015, CIKM.

[21]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[22]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[23]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[24]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[25]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[26]  Thorsten Joachims,et al.  Online learning to diversify from implicit feedback , 2012, KDD.

[27]  Thorsten Joachims,et al.  Structured learning of two-level dynamic rankings , 2011, CIKM '11.

[28]  Saul Vargas,et al.  Explicit relevance models in intent-oriented information retrieval diversification , 2012, SIGIR '12.

[29]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[30]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[31]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[32]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[33]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[34]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[35]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[36]  Tao Qin,et al.  Global Ranking Using Continuous Conditional Random Fields , 2008, NIPS.

[37]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.