A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification

To cope with ambiguous and/or underspecified queries, search result diversification (SRD) is a key technique that has attracted a lot of attention. This paper focuses on implicit SRD, where the possible subtopics underlying a query are unknown beforehand. We formulate implicit SRD as a process of selecting and ranking k exemplar documents that utilizes integer linear programming (ILP). Unlike the common practice of relying on approximate methods, this formulation enables us to obtain the optimal solution of the objective function. Based on four benchmark collections, our extensive empirical experiments reveal that: (1) The factors, such as different initial runs, the number of input documents, query types and the ways of computing document similarity significantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. (2) The proposed method can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD.

[1]  Guido Zuccon,et al.  Using the Quantum Probability Ranking Principle to Rank Interdependent Documents , 2010, ECIR.

[2]  Jianxiong Xiao,et al.  Joint Affinity Propagation for Multiple View Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[4]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[5]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[6]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[7]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[8]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[11]  Scott Sanner,et al.  Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model , 2011, CIKM '11.

[12]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[13]  M. de Rijke,et al.  Result diversification based on query-specific cluster ranking , 2011, J. Assoc. Inf. Sci. Technol..

[14]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[15]  Scott Sanner,et al.  Probabilistic latent maximal marginal relevance , 2010, SIGIR '10.

[16]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[17]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[18]  Fuji Ren,et al.  Search Result Diversification via Filling Up Multiple Knapsacks , 2014, CIKM.

[19]  Jun Wang,et al.  Top-k Retrieval Using Facility Location Analysis , 2012, ECIR.

[20]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[21]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[22]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[23]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[24]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[25]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[26]  Oren Kurland,et al.  Clusters, language models, and ad hoc information retrieval , 2009, TOIS.

[27]  Tie-Yan Liu,et al.  LightLDA: Big Topic Models on Modest Computer Clusters , 2014, WWW.

[28]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[29]  Yangtao Wang,et al.  K-MEAP: Multiple Exemplars Affinity Propagation With Specified $K$ Clusters , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[31]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[32]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[33]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[34]  Oren Kurland The Cluster Hypothesis in Information Retrieval , 2014, ECIR.

[35]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[36]  Oren Kurland,et al.  Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.

[37]  Wenfei Fan,et al.  On the Complexity of Query Result Diversification , 2013, Proc. VLDB Endow..

[38]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[39]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[40]  Craig MacDonald,et al.  Search Result Diversification , 2015, Found. Trends Inf. Retr..

[41]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[42]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[43]  J. I. Brauman Clusters , 1996, Science.