论文信息 - Topic Mixture Model for Document Representation

Topic Mixture Model for Document Representation

In Automatic Text Processing tasks, documents are usually represented in the bag-ofwords space. However, this representation does not take into account the possible relations between words. We propose here a review of a family of document density estimation models for representing documents. Inside this family we derive another possible model: the Theme Topic Mixture Model (TTMM). This model assumes two types of relations among textual data. Topics link words to each other and Themes gather documents with particular distribution over the topics. An experiment reports the performance of the different models in this family over a common task.

Samy Bengio | Mikaela Keller

[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2] Jerome R. Bellegarda,et al. Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4] Wray L. Buntine. Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[5] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[6] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7] Samy Bengio,et al. Theme Topic Mixture Model: A Graphical Model for Document Representation , 2004 .

[8] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[9] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[10] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.