Kernel latent semantic analysis using an information retrieval based kernel

Hidden term relationships can be found within a document collection using Latent semantic analysis (LSA) and can be used to assist in information retrieval. LSA uses the inner product as its similarity function, which unfortunately introduces bias due to document length and term rarity into the term relationships. In this article, we present the novel kernel based LSA method, which uses separate document and query kernel functions to compute document and query similarities, rather than the inner product. We show that by providing an appropriate kernel function, we are able to provide a better fit of our data and hence produce more effective term relationships.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Miles Efron,et al.  Eigenvalue-based model selection during latent semantic indexing , 2005, J. Assoc. Inf. Sci. Technol..

[3]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[4]  Kotagiri Ramamohanarao,et al.  Hybrid pre-query term expansion using latent semantic analysis , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[6]  Kotagiri Ramamohanarao,et al.  The Effect of Weighted Term Frequencies on Probabilistic Latent Semantic Term Relationships , 2008, SPIRE.

[7]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Miles Efron,et al.  Eigenvalue-based model selection during latent semantic indexing: Research Articles , 2005 .

[10]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[11]  Kotagiri Ramamohanarao,et al.  Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus , 2007, PAKDD.

[12]  Ayman Farahat,et al.  Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis , 2006, EACL.

[13]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[14]  Kotagiri Ramamohanarao,et al.  Efficient storage and retrieval of probabilistic latent semantic information for information retrieval , 2008, The VLDB Journal.

[15]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[16]  Chris H. Q. Ding,et al.  A probabilistic model for Latent Semantic Indexing , 2005, J. Assoc. Inf. Sci. Technol..

[17]  Kotagiri Ramamohanarao,et al.  An analysis of latent semantic term self-correlation , 2009, TOIS.

[18]  Chris Ding,et al.  A probabilistic model for Latent Semantic Indexing: Research Articles , 2005 .