Evaluating Retrieval Performance Given Database and Query Characteristics: Analytic Determination of Performance Surfaces

An analytic method of information retrieval and filtering evaluation can quantitatively predict the expected number of documents examined in retrieving a relevant document. It also allows researchers and practitioners to qualitatively understand how varying different estimates of query parameter values affects retrieval performance. The incorporation of relevance feedback to increase our knowledge about the parameters of relevant documents and the robustness of parameter estimates is modeled. Single term and two term independence models, as well as a complete term dependence model, are developed. An economic model of retrieval performance may be used to study the effects of database size and to provide analytic answers to questions comparing retrieval from small and large databases, as well as questions about the number of terms in a query. Results are presented as a performance surface, a three dimensional graph showing the effects of two independent variables on performance.

[1]  Clement T. Yu,et al.  A clustered search algorithm incorporating arbitrary term dependencies , 1982, TODS.

[2]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[3]  Robert M. Losee,et al.  Predicting document retrieval system performance: an expected precision measure , 1987, Inf. Process. Manag..

[4]  Robert M. Losee,et al.  Parameter estimation for probabilistic document-retrieval models , 1988, J. Am. Soc. Inf. Sci..

[5]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions , 1989, SIGIR '89.

[6]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[7]  W. Bruce Croft Boolean queries and term dependencies in probabilistic retrieval models , 1986, J. Am. Soc. Inf. Sci..

[8]  Donna Harman The First Text REtrieval Conference (TREC-1) | NIST , 1993 .

[9]  Robert M. Losee,et al.  Minimizing information overload: the ranking of electronic messages , 1989, J. Inf. Sci..

[10]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[11]  Robert M. Losee,et al.  The Effect of Database Size on Document Retrieval: Random and Best-First Retrieval Models , 1987, SIGIR.

[12]  Robert M. Losee,et al.  Determining Information Retrieval and Filtering Performance without Experimentation , 1995, Inf. Process. Manag..

[13]  Robert M. Losee,et al.  Parameter Estimation for Probabilistic Document-Retrieval Models. , 1988 .

[14]  Robert M. Losee,et al.  Upper Bounds for Retrieval Performance and Their Use Measuring Performance and Generating Optimal Boolean Queries: Can It Get Any Better Than This? , 1994, Inf. Process. Manag..

[15]  P. M. E. Altham,et al.  Exact Bayesian Analysis of a 2 Times 2 Contingency Table, and Fisher's “Exact” Significance Test , 1969 .

[16]  M. E. Maron,et al.  Full-text information retrieval: Further analysis and clarification , 1990, Inf. Process. Manag..

[17]  H. Weiler,et al.  The Use of Incomplete Beta Functions for Prior Distributions in Binomial Sampling , 1965 .

[18]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[19]  Robert M. Losee Term Dependence: Truncating the Bahadur Lazarsfeld Expansion , 1994, Inf. Process. Manag..

[20]  Donald H. Kraft,et al.  Stopping rules and their effect on expected search length , 1979, Inf. Process. Manag..

[21]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[22]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[23]  Russell C. H. Cheng Generating beta variates with nonintegral shape parameters , 1978, CACM.

[24]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[25]  Abraham Bookstein,et al.  Information retrieval: A sequential learning process , 1983, J. Am. Soc. Inf. Sci..

[26]  Robert M. Losee,et al.  An analytic measure predicting information retrieval system performance , 1991, Inf. Process. Manag..