Improving text retrieval for the routing problem using latent semantic indexing

Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations of underlying semantic factors. In previous research, LSI has produced a small improvement in retrieval performance. In this paper, we apply LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query. Once again, LSI slightly improves performance. However, when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.

[1]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[2]  Garrison W. Cottrell,et al.  Latent semantic indexing is an optimal special case of multidimensional scaling , 1992, SIGIR '92.

[3]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[4]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[5]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[6]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[7]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[8]  Richard A. Harshman,et al.  Information retrieval using a singular value decomposition model of latent semantic structure , 1988, SIGIR '88.

[9]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[10]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[11]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[14]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[15]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[16]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[17]  P. C. Wong,et al.  Generalized vector spaces model in information retrieval , 1985, SIGIR '85.

[18]  Ross Wilkinson,et al.  Using the cosine measure in a neural network for document retrieval , 1991, SIGIR '91.