Computational Methods for Intelligent Information Access

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users’ access to many kinds of textual materials, or to documents and services for which textual descriptions are available. A survey of the computational requirements for managing LSI-encoded databases as well as current and future applications of LSI is presented.

[1]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[2]  Susan T. Dumais,et al.  The Relevance Density Method for Multi-Topic Queries in Information Retrieval, , 1992 .

[3]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[4]  Jack J. Dongarra,et al.  Distribution of mathematical software via electronic mail , 1985, SGNM.

[5]  Susan T. Dumais,et al.  Latent semantic analysis and the measurement of knowledge , 1994 .

[6]  Stephen I. Gallant A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks , 1991, Neural Computation.

[7]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[8]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[9]  Gene H. Golub,et al.  Matrix computations , 1983 .

[10]  J. H. Wilkinson,et al.  Handbook for Automatic Computation. Vol II, Linear Algebra , 1973 .

[11]  Michael W. Berry,et al.  A Case Study of Latent Semantic Indexing , 1995 .

[12]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[13]  Gavin W. O''Brien,et al.  Information Management Tools for Updating an SVD-Encoded Indexing Scheme , 1994 .

[14]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[15]  P. W. Foltz,et al.  Using latent semantic indexing for information filtering , 1990, COCS '90.

[16]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[17]  Alston S. Householder,et al.  Handbook for Automatic Computation , 1960, Comput. J..

[18]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[19]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[20]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[21]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[22]  Paul G. Young Cross-Language Information Retrieval Using Latent Semantic Indexing , 1994 .

[23]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[24]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[25]  Jakob Nielsen,et al.  Automating the assignment of submitted manuscripts to reviewers , 1992, SIGIR '92.