Authority and ranking effects in data fusion

This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and manual tracks in TREC 3, 6, 7, and 8 to examine what can be learned to infer a document's probability of being relevant. This paper shows that the potential relevance of a document increases exponentially as the number of systems retrieving it increases—called the Authority Effect. It also shows that documents higher up in ranked lists and found by more systems are more likely to be relevant—called the Ranking Effect. A contribution of this paper is that it shows that the Authority and Ranking Effects can be observed regardless of whether a query is generated manually or automatically and short or long queries are used. Further, it is illustrated that the Authority and Ranking Effects can be observed if the result sets of random groupings of five retrieval systems are compared and only the top 50 results are used in the overlap computation. Also discussed is how the Authority and Ranking Effects can help explain why major data fusion methods perform well. © 2008 Wiley Periodicals, Inc.

[1]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[2]  Anselm Spoerri,et al.  Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..

[3]  Javed A. Aslam,et al.  On the effectiveness of evaluating retrieval systems in the absence of relevance judgments , 2003, SIGIR.

[4]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[5]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[6]  Anselm Spoerri Visual search editor for composing meta searches , 2004, ASIST.

[7]  Anselm Spoerri,et al.  Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied , 2007, Inf. Process. Manag..

[8]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[9]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[10]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[11]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[12]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[13]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[14]  Shengli Wu,et al.  Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.