Information retrieval using a singular value decomposition model of latent semantic structure

In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.

[1]  Richard A. Harshman,et al.  Indexing by Latent Structure Analysis , 1990 .

[2]  C. Coombs A theory of data. , 1965, Psychology Review.

[3]  R. A. Harshman,et al.  Data preprocessing and the extended PARAFAC model , 1984 .

[4]  L. A. Streeter,et al.  An expert/expert-locating system based on automatic representation of semantic structure , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[5]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[6]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[7]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[10]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[11]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[12]  Frank B. Baker,et al.  Information Retrieval Based upon Latent Class Analysis , 1962, JACM.

[13]  P G Ossorio,et al.  Classification Space: A Multivariate Procedure For Automatic? Document Indexing And Retrieval. , 1966, Multivariate behavioral research.

[14]  W. DeSarbo,et al.  Three-way metric unfolding via alternating weighted least squares , 1985 .

[15]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .