Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC

Abstract The Okapi system has been used in a series of experiments on the TREC collections, investigating probabilistic models, relevance feedback, and query expansion, and interaction issues. Some new probabilistic models have been developed, resulting in simple weighting functions that take account of document length and within-document and within-query term frequency. All have been shown to be beneficial. Relevance feedback and query expansion are highly beneficial when based on large quantities of relevance data (as in the routing task). Interaction issues are much more difficult to evaluate in the TREC framework, and no benefits have yet been demonstrated from feedback based on small numbers of “relevant” items identified by intermediary searchers.

[1]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[2]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[3]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[4]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[5]  Fredric C. Gey,et al.  Probabilistic Retrieval in the TIPSTER Collections: An Application of Staged Logistic Regression , 1992, TREC.

[6]  E. Michael Keen,et al.  The Use of Term position Devices in Ranked output Experiments , 1991, J. Documentation.

[7]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing , 1975, J. Am. Soc. Inf. Sci..

[8]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[9]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[10]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[11]  Stephen E. Robertson Documentation note Query-Document Symmetry and Dual Models , 1994, J. Documentation.

[12]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[13]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[14]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[15]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[16]  Robert N. Oddy,et al.  Information Retrieval Research , 1982 .

[17]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[18]  Stephen Walker,et al.  Designing an online public access catalogue: Okapi, a catalogue on a Local Area Network , 1985 .