An Optimization Framework for Merging Multiple Result Lists

Developing effective methods for fusing multiple ranked lists of documents is crucial to many applications. Federated web search, for instance, has become a common practice where a query is issued to different verticals and a single ranked list of blended results is created. While federated search is regarded as collection fusion, data fusion techniques aim at improving search coverage and precision by combining multiple search runs on a single document collection. In this paper, we study in depth and extend a neural network-based approach, LambdaMerge, for merging results of ranked lists drawn from one (i.e., data fusion) or more (i.e., collection fusion) verticals. The proposed model considers the impact of the quality of documents, ranked lists and verticals for producing the final merged result in an optimization framework. We further investigate the potential of incorporating deep structures into the model with an aim of determining better combinations of different evidence. In the experiments on collection fusion and data fusion, the proposed approach significantly outperforms several standard baselines and state-of-the-art learning-based approaches.

[1]  Quoc V. Le,et al.  Learning to Rank with Non-Smooth Cost Functions , 2007 .

[2]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[3]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[4]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[5]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[6]  Yue Liu,et al.  ICTNET at Federated Web Search Track 2014 , 2014, TREC.

[7]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[8]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[9]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[10]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.

[11]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[12]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[13]  David Hawking,et al.  Server selection methods in hybrid portal search , 2005, SIGIR '05.

[14]  Shengli Wu,et al.  Performance prediction of data fusion for information retrieval , 2006, Inf. Process. Manag..

[15]  John Dunnion,et al.  Extending Probabilistic Data Fusion Using Sliding Windows , 2008, ECIR.

[16]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[17]  João Magalhães,et al.  NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion , 2013, TREC.

[18]  Man Lan,et al.  Simple May Be Best - A Simple and Effective Method for Federated Web Search via Search Engine Impact Factor Estimation , 2014, TREC.

[19]  Djoerd Hiemstra,et al.  Overview of the TREC 2014 Federated Web Search Track , 2013, TREC.

[20]  Ophir Frieder,et al.  Fusion of effective retrieval strategies in the same information retrieval system , 2004, J. Assoc. Inf. Sci. Technol..

[21]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[22]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[23]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[24]  Javed A. Aslam,et al.  Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session) , 2000, SIGIR '00.

[25]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[26]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[27]  John Dunnion,et al.  Probabilistic data fusion on a large document collection , 2006, Artificial Intelligence Review.

[28]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[29]  Jianfeng Gao,et al.  Deep stacking networks for information retrieval , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[31]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[32]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[33]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[34]  Milad Shokouhi,et al.  Segmentation of Search Engine Results for Effective Data-Fusion , 2007, ECIR.

[35]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[36]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[37]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[38]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[39]  Oren Kurland,et al.  A Unified Framework for Post-Retrieval Query-Performance Prediction , 2011, ICTIR.

[40]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[41]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.