Optimizing ranking functions: a connectionist approach to adaptive information retrieval

This dissertation examines the use of adaptive methods to automatically improve the performance of ranked text retrieval systems. The goal of a ranked retrieval system is to manage a large collection of text documents and to order documents for a user based on the estimated relevance of the documents to the user's information need (or query). The ordering enables the user to quickly find documents of interest. Ranked retrieval is a difficult problem because of the ambiguity of natural language, the large size of the collections, and because of the varying needs of users and varying collection characteristics. We propose and empirically validate general adaptive methods which improve the ability of a large class of retrieval systems to rank documents effectively. Our main adaptive method is to numerically optimize free parameters in a retrieval system by minimizing a non-metric criterion function. The criterion measures how well the system is ranking documents relative to a target ordering, defined by a set of training queries which include the users' desired document orderings. Thus, the system learns parameter settings which better enable it to rank relevant documents before irrelevant. The non-metric approach is interesting because it is a general adaptive method, an alternative to supervised methods for training neural networks in domains in which rank order or prioritization is important. A second adaptive method is also examined, which is applicable to a restricted class of retrieval systems but which permits an analytic solution. The adaptive methods are applied to a number of problems in text retrieval to validate their utility and practical efficiency. The applications include: A dimensionality reduction of vector-based document representations to a vector space in which inter-document similarity more accurately predicts semantic association; the estimation of a similarity measure which better predicts the relevance of documents to queries; and the estimation of a high-performance neural network combination of multiple retrieval systems into a single overall system. The applications demonstrate that the approaches improve performance and adapt to varying retrieval environments. We also compare the methods to numerous alternative adaptive methods in the text retrieval literature, with very positive results.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Stephen I. Gallant A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks , 1991, Neural Computation.

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Richard K. Belew,et al.  Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents , 1989, SIGIR '89.

[5]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[6]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[7]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Daniel E. Rose A Symbolic and Connectionist Approach To Legal Information Retrieval , 1994 .

[10]  Yiyu Yao,et al.  An analysis of vector space models based on computational geometry , 1992, SIGIR '92.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Paul E. Green,et al.  Multidimensional Scaling: Concepts and Applications , 1989 .

[13]  Slava M. Katz,et al.  Co-Occurrences of Antonymous Adjectives and Their Contexts , 1991, Comput. Linguistics.

[14]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[15]  Donna K. Harman,et al.  An experimental study of factors important in document ranking , 1986, SIGIR '86.

[16]  J. Elman Representation and structure in connectionist models , 1991 .

[17]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[18]  Garrison W. Cottrell,et al.  A model of symbol grounding in a temporal environment , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[19]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[20]  Jacob Shapiro,et al.  Multiversion Information Retrieval Systems and Feedback with Mechanism of Selection , 1993, J. Am. Soc. Inf. Sci..

[21]  I. Spence,et al.  Single subject incomplete designs for nonmetric multidimensional scaling , 1974 .

[22]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[23]  Martha W. Evens,et al.  Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[24]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[25]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[26]  A. Tversky Features of Similarity , 1977 .

[27]  P. Smolensky THE CONSTITUENT STRUCTURE OF CONNECTIONIST MENTAL STATES: A REPLY TO FODOR AND PYLYSHYN , 2010 .

[28]  Michael C. Mozer,et al.  Inductive Information Retrieval Using Parallel Distributed Computation. , 1984 .

[29]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[30]  F. Crestani,et al.  Learning strategies for an adaptive information retrieval system using neural networks , 1993, IEEE International Conference on Neural Networks.

[31]  M. E. Maron,et al.  On indexing, retrieval and the meaning of about , 1977, J. Am. Soc. Inf. Sci..

[32]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[33]  Adi Raveh,et al.  A Nonmetric Approach to Linear Discriminant Analysis , 1989 .

[34]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[35]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[36]  M. A. Porter,et al.  Graphical Exploratory Data Analysis. , 1988 .

[37]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[38]  Susan T. Dumais,et al.  Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval , 1990 .

[39]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[40]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[41]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[42]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[43]  Richard K. Belew,et al.  Exporting phrases: a statistical analysis of topical language , 1991 .

[44]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model , 1990, Inf. Process. Manag..

[45]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: Part 2. The contextual enhancement effect and some tests and extensions of the model. , 1982, Psychological review.

[46]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[47]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[48]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[49]  P. Arabie,et al.  Multidimensional scaling of measures of distance between partitions , 1973 .

[50]  Daniel E. Rose,et al.  Content awareness in a file system interface: implementing the “pile” metaphor for organizing information , 1993, SIGIR.

[51]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[52]  Louis M. Gomez,et al.  All the right words: Finding what you want as a function of richness of indexing vocabulary , 1990, J. Am. Soc. Inf. Sci..

[53]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[54]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[55]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..

[56]  Donald H. Kraft,et al.  Operations Research Applied to Document Indexing and Retrieval Decisions , 1977, JACM.

[57]  Chris Buckley,et al.  Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models , 1992, TREC.

[58]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[59]  P. W. Foltz,et al.  Using latent semantic indexing for information filtering , 1990, COCS '90.

[60]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[61]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[62]  Kui-Lam Kwok,et al.  Retrieval Experiments with a Large Collection using PIRCS , 1992, TREC.

[63]  John J. Regazzi Performance measures for information retrieval systems - an experimental approach , 1988, J. Am. Soc. Inf. Sci..

[64]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[65]  Paul E. Green,et al.  Multidimensional scaling and related techniques in marketing analysis , 1971 .

[66]  H. White,et al.  Cross-Validation Estimates IMSE , 1993, NIPS 1993.

[67]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[68]  Robert R. Korfhage,et al.  Visualization of a Document Collection: The VIBE System , 1993, Inf. Process. Manag..

[69]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[70]  Hisao Miyano,et al.  Sequential estimation in multidimensional scaling , 1982 .

[71]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[72]  Amos Tversky,et al.  Studies of similarity , 1978 .

[73]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[74]  Louis Guttman,et al.  What Is Not What in Statistics , 1977 .

[75]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[76]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[77]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[78]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[79]  J. C. Scholtes Unsupervised learning and the information retrieval problem , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[80]  Bruce Bloxom,et al.  Constrained multidimensional scaling inN spaces , 1978 .

[81]  R. Nosofsky Stimulus bias, asymmetric similarity, and classification , 1991, Cognitive Psychology.

[82]  Lev Goldfarb,et al.  Hybrid Associative Memories And Metric Data Models , 1988, Defense, Security, and Sensing.

[83]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .

[84]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[85]  G. Stewart Introduction to matrix computations , 1973 .

[86]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[87]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[88]  Libena Vokac,et al.  Optimal values of recall and precision , 1982, J. Am. Soc. Inf. Sci..

[89]  Edward A. Fox,et al.  Combining Evidence from Multiple Searches , 1992, TREC.

[90]  J. Ramsay Maximum likelihood estimation in multidimensional scaling , 1977 .

[91]  Garrison W. Cottrell,et al.  Grounding Meaning in Perception , 1990, GWAI.

[92]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[93]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[94]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[95]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[96]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[97]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[98]  Kui-Lam Kwok,et al.  Experiments with a component theory of probabilistic information retrieval based on single terms as document components , 1990, TOIS.

[99]  Paul Thompson Description of the PRC CEO Algorithm for TREC , 1992, TREC.

[100]  Garrison W. Cottrell,et al.  Latent semantic indexing is an optimal special case of multidimensional scaling , 1992, SIGIR '92.

[101]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Paul E. Nelson Site Report for the Text REtrieval Conference , 1992, TREC.

[103]  S. Lewandowsky,et al.  Robust multidimensional scaling , 1989 .

[104]  William S. Cooper,et al.  On selecting a measure of retrieval effectiveness , 1973, J. Am. Soc. Inf. Sci..

[105]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[106]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[107]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[108]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986, J. Am. Soc. Inf. Sci..

[109]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[110]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[111]  Yiyu Yao,et al.  Query formulation in linear retrieval models , 1990, J. Am. Soc. Inf. Sci..

[112]  Paul W. Munro,et al.  Principal Components Analysis Of Images Via Back Propagation , 1988, Other Conferences.

[113]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[114]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[115]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[116]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[117]  Roger W. Schvaneveldt,et al.  Using pathfinder to extract semantic information from text , 1990 .

[118]  Tim van Gelder,et al.  Compositionality: A Connectionist Variation on a Classical Theme , 1990, Cogn. Sci..

[119]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[120]  I. Borg Multidimensional similarity structure analysis , 1987 .

[121]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[122]  Gerard Salton,et al.  Automatic term class construction using relevance--A summary of work in automatic pseudoclassification , 1980, Inf. Process. Manag..

[123]  Paul Thompson Machine Learning in the Combination of Expert Opinion Approach to IR , 1991, ML.

[124]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[125]  Abraham Bookstein,et al.  Outline of a General Probabilistic Retrieval Model , 1983, J. Documentation.

[126]  Richard Kuehn Belew,et al.  Adaptive information retrieval: machine learning in associative networks (connectionist, free-text, browsing, feedback) , 1986 .

[127]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[128]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .