Low-rank Orthogonal Decompositions for Information Retrieval Applications

Current methods to index and retrieve documents from databases usually depend on a lexical match between query terms and keywords extracted from documents in a database. These methods can produce incomplete or irrelevant results due to the use of synonyms and polysemus words. The association of terms with documents (or implicit semantic structure) can be derived using large sparse term-by-document matrices. In fact, both terms and documents can be matched with user queries using representations in k-space (where 100 k 200) derived from k of the largest approximate singular vectors of these term-by-document matrices. This completely automated approach called Latent Semantic Indexing or LSI, uses subspaces spanned by the approximate singular vectors to encode important asso-ciative relationships between terms and documents in k-space. Using LSI, two or more documents may be close to each other in k-space (and hence meaning) yet share no common terms. The focus of this work is to demonstrate the computational advantages of exploiting low-rank orthogonal decompositions such as the ULV (or URV) as opposed to the truncated singular value decomposition (SVD) for the construction of initial and updated rank-k subspaces arising from LSI applications.

[1]  Per Christian Hansen,et al.  Computing Truncated Singular Value Decomposition Least Squares Solutions by Rank Revealing QR-Factorizations , 1990, SIAM J. Sci. Comput..

[2]  Per Christian Hansen,et al.  Some Applications of the Rank Revealing QR Factorization , 1992, SIAM J. Sci. Comput..

[3]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[4]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[5]  J. Bunch,et al.  Updating the singular value decomposition , 1978 .

[6]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[7]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[8]  G. W. Stewart,et al.  An updating algorithm for subspace tracking , 1992, IEEE Trans. Signal Process..

[9]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[10]  Susan T. Dumais,et al.  The Computationel Complexity of Alternative Updating Approaches for an SVD-Encoded Indexing Scheme , 1995, PPSC.

[11]  L. Foster Rank and null space calculations using matrix decomposition without column interchanges , 1986 .

[12]  Gene H. Golub,et al.  Matrix computations , 1983 .

[13]  T. Chan Rank revealing QR factorizations , 1987 .

[14]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[15]  Gavin W. O''Brien,et al.  Information Management Tools for Updating an SVD-Encoded Indexing Scheme , 1994 .

[16]  Christian H. Bischof,et al.  Structure-Preserving and Rank-Revealing QR-Factorizations , 1991, SIAM J. Sci. Comput..

[17]  Paul G. Young Cross-Language Information Retrieval Using Latent Semantic Indexing , 1994 .

[18]  Per Christian Hansen,et al.  Accuracy of TSVD solutions computed from rank-revealing decompositions , 1995 .