Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

[1]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[2]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[3]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[4]  Ankush Gupta,et al.  An Information Retrieval Model Based on Latent Semantic Indexing with Intelligent Preprocessing , 2005, J. Inf. Knowl. Manag..

[5]  E. Chisholm,et al.  New Term Weighting Formulas for the Vector Space Method in Information Retrieval , 1999 .

[6]  Paul B. Kantor Information Retrieval Techniques , 1994 .

[7]  Chris H. Q. Ding,et al.  Term norm distribution and its effects on Latent Semantic Indexing , 2005, Inf. Process. Manag..

[8]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[9]  Michael W. Berry,et al.  Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[10]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[11]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[14]  Chris Ding,et al.  On the Use of Singular Value Decomposition for Text Retrieval , 2000 .

[15]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[16]  Karen Spärck Jones IDF term weighting and IR research lessons , 2004, J. Documentation.

[17]  Susan T. Dumais Enhancing performance in lsi (latent semantic indexing) retrieval , 1989 .

[18]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[19]  Elizabeth R. Jessup,et al.  Taking a new look at the latent semantic analysis approach to information retrieval , 2001 .

[20]  Terry Winograd,et al.  Language as a Cognitive Process , 1983, CL.