Lexical-matching methods for information retrieval can be inaccurate when they are used to match a user's queries. Typically, information is retrieved by literally matching terms in documents with those of the query. The problem is that users want to retrieve on the basis of conceptual topic or meaning of a document. There are usually many ways to express a given concept (synonymy), so the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. The implicit high-order structure of associating terms with documents can be exploited by the singular value decomposition (SVD). Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the SVD to estimate the underlying latent semantic structure of the word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the e ects of word choice variability by representing terms and documents using the (orthogonal) left and right singular vectors. Current methods for adding new text to an LSI database can have deteriorating e ects on the orthogonality of the vectors used to represent terms and documents in high-dimensional subspaces. Updating the SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is one remedy. Computing the SVD of the new term-document matrix can be avoided by using SVDPACKC routines for appropriate submatrices constructed from existing term and document vectors and similar vectors corresponding to the new text. The cost of the numerical computations needed to update the SVD versus the potential inaccuracy of simply folding-in text presents an interesting tradeo for LSI database management. iv
[1]
J. H. Wilkinson,et al.
Handbook for Automatic Computation. Vol II, Linear Algebra
,
1973
.
[2]
Iain S. Duff,et al.
Sparse matrix test problems
,
1982
.
[3]
Alfred V. Aho,et al.
The awk programming language
,
1988
.
[4]
Gerard Salton,et al.
Improving retrieval performance by relevance feedback
,
1997,
J. Am. Soc. Inf. Sci..
[5]
Susan T. Dumais,et al.
Improving the retrieval of information from external sources
,
1991
.
[6]
Michael W. Berry,et al.
SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition
,
1992
.
[7]
Michael W. Berry,et al.
SVDPACKC (Version 1.0) User''s Guide
,
1993
.
[8]
Richard Barrett,et al.
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
,
1994,
Other Titles in Applied Mathematics.