Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial

Fusion is an important and central concept in Information Retrieval. The goal of fusion methods is to merge different sources of information so as to address a retrieval task. For example, in the adhoc retrieval setting, fusion methods have been applied to merge multiple document lists retrieved for a query. The lists could be retrieved using different query representations, document representations, ranking functions and corpora. The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.

[1]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[2]  John Dunnion,et al.  Estimating probabilities for effective data fusion , 2010, SIGIR '10.

[3]  Fernando Diaz,et al.  Regularizing query-based retrieval scores , 2007, Information Retrieval.

[4]  Alistair Moffat,et al.  The Effect of Pooling and Evaluation Depth on Metric Stability , 2010, EVIA@NTCIR.

[5]  Cha Zhang,et al.  Ensemble Machine Learning , 2012 .

[6]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[7]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[8]  Avi Arampatzis,et al.  A signal-to-noise approach to score normalization , 2009, CIKM.

[9]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[10]  David Pinto,et al.  Selecting the N-Top Retrieval Result Lists for an Effective Data Fusion , 2010, CICLing.

[11]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[12]  Shengli Wu,et al.  Assigning appropriate weights for the linear combination data fusion method in information retrieval , 2009, Inf. Process. Manag..

[13]  Shengli Wu,et al.  Adaptive data fusion methods in information retrieval , 2014, J. Assoc. Inf. Sci. Technol..

[14]  Peter Bailey,et al.  UQV100: A Test Collection with Query Variability , 2016, SIGIR.

[15]  Shengli Wu,et al.  Performance prediction of data fusion for information retrieval , 2006, Inf. Process. Manag..

[16]  Shengli Wu,et al.  Applying the data fusion technique to blog opinion retrieval , 2012, Expert Syst. Appl..

[17]  Emine Yilmaz,et al.  Measure-based metasearch , 2005, SIGIR '05.

[18]  H. Young Condorcet's Theory of Voting , 1988, American Political Science Review.

[19]  Vincent Conitzer,et al.  Handbook of Computational Social Choice , 2016 .

[20]  W. Bruce Croft,et al.  A Language Modeling Framework for Selective Query Expansion , 2004 .

[21]  Ophir Frieder,et al.  Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies , 2003, SAC '03.

[22]  Maarten de Rijke,et al.  Manifold Learning for Rank Aggregation , 2018, WWW.

[23]  J. Shane Culpepper,et al.  Data Fusion for Japanese Term and Character N-gram Search , 2015, ADCS.

[24]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[25]  Shengli Wu,et al.  Search result diversification via data fusion , 2014, SIGIR.

[26]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[27]  Santthosh Babu Selvadurai Implementing a Metasearch Framework with Content-directed Result Merging , 2007 .

[28]  João Magalhães,et al.  NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion , 2013, TREC.

[29]  W. Bruce Croft,et al.  Modeling reformulation using query distributions , 2013, TOIS.

[30]  Avi Arampatzis,et al.  Unsupervised linear score normalization revisited , 2012, SIGIR '12.

[31]  J. Shane Culpepper,et al.  Modeling Relevance as a Function of Retrieval Rank , 2016, AIRS.

[32]  Chunlin Xu,et al.  Differential Evolution-Based Fusion and Its Properties for Web Search , 2016, 2016 13th Web Information Systems and Applications Conference (WISA).

[33]  John Dunnion,et al.  ProbFuse: a probabilistic approach to data fusion , 2006, SIGIR.

[34]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[35]  Shengli Wu,et al.  Regression Relevance Models for Data Fusion , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[36]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[37]  Mounia Lalmas,et al.  A Formal Model for Data Fusion , 2002, FQAS.

[38]  Miles Efron,et al.  Generative model-based metasearch for data fusion in information retrieval , 2009, JCDL '09.

[39]  J. Shane Culpepper,et al.  Efficient distributed selective search , 2016, Information Retrieval Journal.

[40]  Kui-Lam Kwok,et al.  Improving Weak Ad-Hoc Retrieval by Web Assistance and Data Fusion , 2005, AIRS.

[41]  Shengli Wu,et al.  The weighted Condorcet fusion in information retrieval , 2013, Inf. Process. Manag..

[42]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[43]  Oren Kurland,et al.  Query Performance Prediction Using Reference Lists , 2016, ACM Trans. Inf. Syst..

[44]  Kwong Bor Ng,et al.  An investigation of the conditions for effective data fusion in information retrieval , 1998 .

[45]  John Dunnion,et al.  Extending Probabilistic Data Fusion Using Sliding Windows , 2008, ECIR.

[46]  Craig MacDonald,et al.  Hypothesis testing for the risk-sensitive evaluation of retrieval systems , 2014, SIGIR.

[47]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[48]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[49]  Ellen M. Voorhees,et al.  Bias and the limits of pooling for large collections , 2007, Information Retrieval.

[50]  Javed A. Aslam,et al.  Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions , 2007, ECIR.

[51]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[52]  Javed A. Aslam,et al.  A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm , 2003, SIGIR '03.

[53]  Oren Kurland,et al.  A Probabilistic Fusion Framework , 2016, CIKM.

[54]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[55]  James Allan,et al.  Learning to select rankers , 2010, SIGIR '10.

[56]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[57]  M. de Rijke,et al.  The Impact of Semantic Document Expansion on Cluster-Based Fusion for Microblog Search , 2014, ECIR.

[58]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[59]  W. Bruce Croft,et al.  An Optimization Framework for Merging Multiple Result Lists , 2015, CIKM.

[60]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[61]  Mark Sanderson,et al.  Experiments on data fusion using headline information , 2002, SIGIR '02.

[62]  Shengli Wu,et al.  A Geometric probabilistic framework for data fusion in information retrieval , 2007, 2007 10th International Conference on Information Fusion.

[63]  R. Manmatha,et al.  A formal approach to score normalization for meta-search , 2002 .

[64]  Shengli Wu,et al.  Data fusion with estimated weights , 2002, CIKM '02.

[65]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[66]  Shengli Wu,et al.  Linear combination of component results in information retrieval , 2012, Data Knowl. Eng..

[67]  Shengli Wu,et al.  Evaluating Score Normalization Methods in Data Fusion , 2006, AIRS.

[68]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[69]  Oren Kurland,et al.  From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities , 2011, ICTIR.

[70]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[71]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[72]  Christopher C. Vogt How much more is better? Characterising the effects of adding more IR Systems to a combination , 2000, RIAO.

[73]  David E. Losada,et al.  Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems , 2017, Inf. Process. Manag..

[74]  Stephen E. Robertson,et al.  Modeling score distributions in information retrieval , 2011, Information Retrieval.

[75]  Oren Kurland,et al.  Selective Cluster-Based Document Retrieval , 2016, CIKM.

[76]  J. Shane Culpepper,et al.  Risk-Reward Trade-offs in Rank Fusion , 2017, ADCS.

[77]  Manuel Montes-y-Gómez,et al.  On the Selection of the Best Retrieval Result Per Query - An Alternative Approach to Data Fusion , 2009, FQAS.

[78]  David Hawking,et al.  Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[79]  J. Shane Culpepper,et al.  Monitoring the Top-m Rank Aggregation of Spatial Objects in Streaming Queries , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[80]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[81]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[82]  James Allan,et al.  Evaluation over thousands of queries , 2008, SIGIR '08.

[83]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[84]  Shengli Wu,et al.  Balancing efficiency and effectiveness for fusion-based search engines in the 'big data' environment , 2016, Inf. Res..

[85]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[86]  J. Shane Culpepper,et al.  RMIT at the NTCIR-13 We Want Web Task , 2017, NTCIR.

[87]  Yiqun Liu,et al.  Improving Tail Query Performance by Fusion Model , 2014, CIKM.

[88]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[89]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[90]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[91]  Milad Shokouhi,et al.  Segmentation of Search Engine Results for Effective Data-Fusion , 2007, ECIR.

[92]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[93]  Tao Qin,et al.  A New Probabilistic Model for Rank Aggregation , 2010, NIPS.

[94]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[95]  Peter Bailey,et al.  Retrieval Consistency in the Presence of Query Variations , 2017, SIGIR.

[96]  Oren Kurland,et al.  Query-performance prediction: setting the expectations straight , 2014, SIGIR.

[97]  Oren Kurland,et al.  Predicting query performance for fusion-based retrieval , 2012, CIKM.

[98]  M. de Rijke,et al.  Burst-aware data fusion for microblog search , 2015, Inf. Process. Manag..

[99]  J. Shane Culpepper,et al.  Efficient Location-Aware Web Search , 2015, ADCS.

[100]  J. Shane Culpepper,et al.  Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval , 2017, SIGIR.

[101]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[102]  Shengli Wu Applying statistical principles to data fusion in information retrieval , 2009, Expert Syst. Appl..

[103]  J. Shane Culpepper,et al.  Can Deep Effectiveness Metrics Be Evaluated Using Shallow Judgment Pools? , 2017, SIGIR.

[104]  Oren Kurland,et al.  From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities , 2009, ICTIR.

[105]  M. de Rijke,et al.  Late Data Fusion for Microblog Search , 2013, ECIR.

[106]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[107]  Nir Ailon,et al.  Aggregation of Partial Rankings, p-Ratings and Top-m Lists , 2007, SODA '07.

[108]  Oren Kurland,et al.  Utilizing relevance feedback in fusion-based retrieval , 2014, SIGIR.

[109]  João Magalhães,et al.  Inverse square rank fusion for multimodal search , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[110]  J. Shane Culpepper,et al.  The effect of pooling and evaluation depth on IR metrics , 2016, Information Retrieval Journal.

[111]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[112]  Robi Polikar,et al.  An Ensemble-Based Incremental Learning Approach to Data Fusion , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[113]  Shengli Wu,et al.  The Experiments with the Linear Combination Data Fusion Method in Information Retrieval , 2008, APWeb.

[114]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[115]  M. Trick,et al.  Voting schemes for which it can be difficult to tell who won the election , 1989 .

[116]  Chris Buckley,et al.  The TREC-8 Query Track , 1999, TREC.