论文信息 - Optimizing Similarity Using Multi-Query Relevance Feedback

Optimizing Similarity Using Multi-Query Relevance Feedback

We propose a novel method for automatically adjusting parameters in ranked-output text retrieval systems to improve retrieval performance. A ranked-output text retrieval system implements a ranking function which orders documents, placing documents estimated to be more relevant to the user's query before less relevant ones. The system adjusts its parameters to maximize the match between the system's document ordering and a target ordering. The target ordering is typically given by user feedback on a set of sample queries, but is more generally any document preference relation. We demonstrate the utility of the approach by using it to estimate a similarity measure (scoring the relevance of documents to queries) in a vector space model of information retrieval. Experimental results using several collections indicate that the approach automatically finds a similarity measure which performs equivalently to or better than all “classic” similarity measures studied. It also performs within 1% of an estimated optimal measure (found by exhaustive sampling of the similarity measures). The method is compared to two alternative methods: A Perceptron learning rule motivated by Wong and Yao's (1990) Query Formulation method, and a Least Squared learning rule, motivated by Fuhr and Buckley's (1991) Probabilistic Learning approach. Though both alternatives have useful characteristics, we demonstrate empirically that neither can be used to estimate the parameters of the optimal similarity measure. © 1998 John Wiley & Sons, Inc.

G. Cottrell | R. Belew | B. Bartell

[1] R. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[2] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[3] Don R. Swanson,et al. Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[4] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5] L. Guttman. What is Not What in Statistics , 1977 .

[6] Michael McGill,et al. An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. , 1979 .

[7] Gerard Salton,et al. Automatic term class construction using relevance--A summary of work in automatic pseudoclassification , 1980, Inf. Process. Manag..

[8] Libena Vokac,et al. Optimal values of recall and precision , 1982, J. Am. Soc. Inf. Sci..

[9] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[10] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[11] Donna K. Harman,et al. An experimental study of factors important in document ranking , 1986, SIGIR '86.