Report on the TREC-4 Experiment: Combining Probabilistic and Vector-Space Schemes

This paper describes and evaluate a retrieval scheme combining the OKAPI probabilistic retrieval model with various vector-space schemes. In this study, each retrieval strategy represents both queries and documents using the same set of single terms ; however they weight them differently. To compine these search schemes, we do not apply a given combination operator on the retrieval status nor the rank of each retrieved record (e.g., sum, average, max., etc.). We think that each retrieval strategy may perform well for a set of queries and poorly for other requests. Thus, based on a given query's statistical characteristics, our search model first select the more appropriate retrieval scheme and then retrieves information based on the selected search mechanism. Since the selection procedure is done before any search operation, our approach has the advantage of limiting the search time to one retrieval algorithm instead of retrieving itmes using various retrieval schemes, and then combining the given results. In particular, this study adesses the following questions : can the statistical characteristics of a query be good predicators in an automatic selection procedure ; faced with the relativity high retrieval effectiveness achieved by the OKAPI model, can various vector-space schemes further improve the retrieval performance of the OKAPI approach, and can the learning results obtained with one tested collection be valid for anther corpus

[1]  Peter Willett,et al.  Criteria for the Selection of Search Strategies in Best-Match Document-Retrieval Systems , 1986, Int. J. Man Mach. Stud..

[2]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[3]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[4]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[5]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[6]  Michael McGill,et al.  A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment , 1980, SIGIR '80.

[7]  Don R. Swanson,et al.  Information Retrieval as a Trial-And-Error Process , 1977, The Library Quarterly.

[8]  Carol Tenopir,et al.  Full text database retrieval performance , 1985 .

[9]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[10]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[11]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[12]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[13]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[14]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[15]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[17]  Stephen E. Robertson,et al.  Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC , 1995, Inf. Process. Manag..

[18]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 2: Mathematical treatment of CEO model 3 , 1990, Inf. Process. Manag..

[19]  Edward A. Fox,et al.  Combining Evidence from Multiple Searches , 1992, TREC.

[20]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[21]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[22]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[23]  James Blustein,et al.  A Statistical Analysis of the TREC-3 Data , 1995, TREC.

[24]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.