Reducing the Uncertainty in Resource Selection

The distributed retrieval process is plagued by uncertainty. Sampling, selection, merging and ranking are all based on very limited information compared to centralized retrieval. In this paper, we focus our attention on reducing the uncertainty within the resource selection phase by obtaining a number of estimates, rather than relying upon only one point estimate. We propose three methods for reducing uncertainty which are compared against state-of-the-art baselines across three distributed retrieval testbeds. Our results show that the proposed methods significantly improve baselines, reduce the uncertainty and improve robustness of resource selection.

[1]  Fabio Crestani,et al.  Adaptive query-based sampling for distributed IR , 2006, SIGIR.

[2]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[3]  Claudia Hauff,et al.  Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.

[4]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[5]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[6]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[7]  Avi Arampatzis,et al.  On CORI Results Merging , 2013, ECIR.

[8]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[9]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[10]  Milad Shokouhi,et al.  Evaluating Server Selection for Federated Search , 2010, ECIR.

[11]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[12]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[13]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[14]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[15]  Fernando Diaz,et al.  Classification-based resource selection , 2009, CIKM.

[16]  Fabio Crestani,et al.  Logic and Uncertainty in Information Retrieval , 2001, ESSIR.

[17]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[18]  Fabio Crestani,et al.  Lectures on Information Retrieval , 2001, Lecture Notes in Computer Science.

[19]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[20]  Milad Shokouhi,et al.  SUSHI : Scoring Scaled Samples for Server Selection , 2009 .

[21]  Milad Shokouhi,et al.  Using query logs to establish vocabularies in distributed information retrieval , 2007, Inf. Process. Manag..

[22]  Ling Liu,et al.  Distributed query sampling: a quality-conscious approach , 2006, SIGIR '06.

[23]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[24]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[25]  Ingemar J. Cox,et al.  Risk-Aware Information Retrieval , 2009, ECIR.

[26]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[27]  Fabio Crestani,et al.  Resource selection and data fusion in multimedia distributed digital libraries , 2003, SIGIR.