Information Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure

In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.

[1]  C. Coombs A theory of data. , 1965, Psychology Review.

[2]  R. A. Harshman,et al.  Data preprocessing and the extended PARAFAC model , 1984 .

[3]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[4]  L. A. Streeter,et al.  An expert/expert-locating system based on automatic representation of semantic structure , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[5]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[6]  Matthew B. Koll WEIRD: an approach to concept-based information retrieval , 1979, SIGF.

[7]  Oliver L. Lilley Evaluation of the subject catalog. Criticisms and a proposal , 1954 .

[8]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[9]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[10]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[11]  S. T. Dumais,et al.  Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems , 1983, The Bell System Technical Journal.

[12]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[13]  F. Grund Forsythe, G. E. / Malcolm, M. A. / Moler, C. B., Computer Methods for Mathematical Computations. Englewood Cliffs, New Jersey 07632. Prentice Hall, Inc., 1977. XI, 259 S , 1979 .

[14]  Frank B. Baker,et al.  Information Retrieval Based upon Latent Class Analysis , 1962, JACM.

[15]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[16]  Michael A. Malcolm,et al.  Computer methods for mathematical computations , 1977 .

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[19]  Richard A. Harshman,et al.  Indexing by Latent Structure Analysis , 1990 .

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  P G Ossorio,et al.  Classification Space: A Multivariate Procedure For Automatic? Document Indexing And Retrieval. , 1966, Multivariate behavioral research.

[22]  W. DeSarbo,et al.  Three-way metric unfolding via alternating weighted least squares , 1985 .

[23]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .