Computational Methods for Intelligent Information Access

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users’ access to many kinds of textual materials, or to documents and services for which textual descriptions are available. A survey of the computational requirements for managing LSI-encoded databases as well as current and future applications of LSI is presented.

[1]  Gavin W. O''Brien,et al.  Information Management Tools for Updating an SVD-Encoded Indexing Scheme , 1994 .

[2]  Michael W. Berry,et al.  A Case Study of Latent Semantic Indexing , 1995 .

[3]  J. H. Wilkinson,et al.  Handbook for Automatic Computation. Vol II, Linear Algebra , 1973 .

[4]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[5]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[6]  P. W. Foltz,et al.  Using latent semantic indexing for information filtering , 1990, COCS '90.

[7]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[8]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[9]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[10]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[11]  Paul G. Young Cross-Language Information Retrieval Using Latent Semantic Indexing , 1994 .

[12]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[13]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[14]  Alston S. Householder,et al.  Handbook for Automatic Computation , 1960, Comput. J..

[15]  Gene H. Golub,et al.  Matrix computations , 1983 .

[16]  Susan T. Dumais,et al.  Latent semantic analysis and the measurement of knowledge , 1994 .

[17]  Jack J. Dongarra,et al.  Distribution of mathematical software via electronic mail , 1985, SGNM.

[18]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[19]  Jakob Nielsen,et al.  Automating the assignment of submitted manuscripts to reviewers , 1992, SIGIR '92.

[20]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[21]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[22]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[23]  Stephen I. Gallant,et al.  A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Networks , 1991, Neural Computation.

[24]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[25]  Susan T. Dumais,et al.  The Relevance Density Method for Multi-Topic Queries in Information Retrieval, , 1992 .