A decision mechanism for the selective combination of evidence in topic distillation

The combination of evidence can increase retrieval effectiveness. In this paper, we investigate the effectiveness of a decision mechanism for the selective combination of evidence for Web Information Retrieval and particularly for topic distillation. We introduce two measures of a query’s broadness and use them to select an appropriate combination of evidence for each query. The results from our experiments show that there is a statistically significant association between the output of the decision mechanism and the relative effectiveness of the different combinations of evidence. Moreover, we show that the proposed methodology can be applied in an operational setting, where relevance information is not available, by setting the decision mechanism’s thresholds automatically.

[1]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[2]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[3]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[4]  Iadh Ounis,et al.  A study of parameter tuning for term frequency normalization , 2003, CIKM '03.

[5]  David Carmel,et al.  Topic Distillation with Knowledge Agents , 2002, TREC.

[6]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[7]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[8]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[9]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[10]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[11]  Iadh Ounis,et al.  A Query-based Pre-retrieval Model Selection Approach to Information Retrieval , 2004, RIAO.

[12]  David Carmel,et al.  Juru at TREC 2003 - Topic Distillation using Query-Sensitive Tuning and Cohesiveness Filtering , 2003, TREC.

[13]  Iadh Ounis,et al.  Usefulness of hyperlink structure for query-biased topic distillation , 2004, SIGIR '04.

[14]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[15]  David Hawking,et al.  Query-independent evidence in home page finding , 2003, TOIS.

[16]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[17]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[18]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[19]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[20]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[21]  Iadh Ounis,et al.  Selective Combination of Evidence for Topic Distillation using Document and Aggregate-level Information , 2004, RIAO.

[22]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[23]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[24]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[25]  Richard M. Everson,et al.  When Are Links Useful? Experiments in Text Classification , 2003, ECIR.

[26]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[27]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[28]  Kevin S. McCurley,et al.  Untangling compound documents on the web , 2003, HYPERTEXT '03.

[29]  Alan F. Smeaton,et al.  Improving the Evaluation of Web Search Systems , 2003, ECIR.

[30]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[31]  Ben Shneiderman,et al.  Identifying aggregates in hypertext structures , 1991, HYPERTEXT '91.

[32]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[33]  Iadh Ounis,et al.  University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope , 2003, TREC.

[34]  Wen-Syan Li,et al.  Defining logical domains in a web site , 2000, HYPERTEXT '00.

[35]  Kui-Lam Kwok,et al.  TREC 2002 Web, Novelty and Filtering Track Experiments using PIRCS , 2002, TREC.

[36]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .