Latent semantic analysis for multiple-type interrelated data objects

Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment results show that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.

[1]  J. Cullum,et al.  Lanczos Algorithms for Large Symmetric Eigenvalue Computations Vol. I Theory , 1984 .

[2]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[3]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[4]  Gene H. Golub,et al.  Matrix Computations, Third Edition , 1996 .

[5]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[6]  Chris Ding,et al.  A probabilistic model for Latent Semantic Indexing: Research Articles , 2005 .

[7]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[8]  Brian D. Davison Toward a unification of text and link analysis , 2003, SIGIR.

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[11]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[12]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[13]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[14]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[15]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[16]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[17]  Rie Kubota Ando Latent semantic space: iterative scaling improves precision of inter-document similarity measurement , 2000, SIGIR '00.

[18]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[19]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[20]  Garrison W. Cottrell,et al.  Latent semantic indexing is an optimal special case of multidimensional scaling , 1992, SIGIR '92.

[21]  Chris H. Q. Ding,et al.  A probabilistic model for Latent Semantic Indexing , 2005, J. Assoc. Inf. Sci. Technol..

[22]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[23]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[24]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[25]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[26]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[27]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..