论文信息 - Optimizing ranking functions: a connectionist approach to adaptive information retrieval

Optimizing ranking functions: a connectionist approach to adaptive information retrieval

This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector space in which inter-document similarity more accurately predicts semantic association; the estimation of a similarity measure which better predicts the relevance of documents to queries; and the estimation of a high-performance neural network combination of multiple retrieval systems into a single overall system. The applications demonstrate that the approaches improve performance and adapt to varying retrieval environments. We also compare the methods to numerous alternative adaptive methods in the text retrieval literature, with very positive results.

Brian T. Bartell | B. Bartell

[1] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2] Stephen I. Gallant. A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks , 1991, Neural Computation.

[3] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4] Richard K. Belew,et al. Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents , 1989, SIGIR '89.

[5] J. Gower. Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[6] R.J.F. Dow,et al. Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[7] George W. Furnas,et al. Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[8] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[9] Daniel E. Rose. A Symbolic and Connectionist Approach To Legal Information Retrieval , 1994 .

[10] Yiyu Yao,et al. An analysis of vector space models based on computational geometry , 1992, SIGIR '92.

[11] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .