Unsupervised Learning by Probabilistic Latent Semantic Analysis

This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  S. Haberman,et al.  Canonical Analysis of Contingency Tables by Maximum Likelihood , 1986 .

[5]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[8]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[9]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[10]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[11]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[12]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[13]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[14]  S. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval Using Linear Algebra for Intelligent Information Retrieval Using Linear Algebra for Intelligent Information Retrieval , 1995 .

[15]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[17]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[18]  Michael I. Jordan,et al.  Unsupervised Learning from Dyadic Data , 1998 .

[19]  Jerome R. Bellegarda,et al.  Exploiting both local and global constraints for multi-span statistical language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[21]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[22]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[23]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[24]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[25]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.