Enhanced approach for latent semantic indexing using wavelet transform

Latent semantic indexing (LSI) is a technique used for intelligent information retrieval (IR). It can be used as an alternative to traditional keyword matching IR and is attractive in this respect because of its ability to overcome problems with synonymy and polysemy. This study investigates various aspects of LSI: the effect of the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the key stage of the LSI process; and the effect of different threshold types in the HWT on the search results. The developed method allows the visualisation and processing of the term document matrix, generated in the LSI process, using HWT. The results have shown that precision can be increased by applying the HWT as a preprocessing step, with better results for hard thresholding than soft thresholding, whereas standard SVD-based LSI remains the most effective way of searching in terms of recall value.

[1]  Ronald A. DeVore,et al.  Image compression through wavelet transform coding , 1992, IEEE Trans. Inf. Theory.

[2]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[3]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[4]  Tamara G. Kolda,et al.  A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998, TOIS.

[5]  Ioannis Delakis,et al.  Wavelet-based de-noising algorithm for images acquired with parallel magnetic resonance imaging (MRI) , 2007, Physics in medicine and biology.

[6]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[7]  Sheau-Dong Lang,et al.  A neural network model for information retrieval using latent semantic indexing , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[10]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[11]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[12]  Abbes Amira,et al.  TDM modeling and evaluation of different domain transforms for LSI , 2009, Neurocomputing.

[13]  Michael W. Berry,et al.  Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[14]  Jing Gao,et al.  Clustered SVD strategies in latent semantic indexing , 2005, Inf. Process. Manag..

[15]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..