Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models.

[1]  Johan Ugander,et al.  A concave regularization technique for sparse mixture models , 2011, NIPS.

[2]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[3]  Jen-Tzung Chien,et al.  Bayesian Sparse Topic Model , 2013, Journal of Signal Processing Systems.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[6]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[7]  Juan-Zi Li,et al.  Knowledge discovery through directed probabilistic topic models: a survey , 2010, Frontiers of Computer Science in China.

[8]  Jean-Marc Odobez,et al.  A Sparsity Constraint for Topic Models - Application to Temporal Activity Mining , 2010, NIPS 2010.

[9]  Konstantin Vorontsov,et al.  Robust PLSA Performs Better Than LDA , 2013, ECIR.

[10]  Konstantin Vorontsov,et al.  Additive regularization for topic models of text collections , 2014, Doklady Mathematics.

[11]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[12]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[13]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[14]  Bhiksha Raj,et al.  Sparse Overcomplete Latent Variable Decomposition of Counts Data , 2007, NIPS.

[15]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[16]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[17]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[18]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[19]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[20]  David W. Corne,et al.  Multi-objective Topic Modeling , 2013, EMO.

[21]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[23]  Zhijian Ou,et al.  Topic-weak-correlated Latent Dirichlet allocation , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[24]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Kan Li,et al.  Text Categorization Based on Topic Model , 2008, RSKT.

[27]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[28]  Cornelia Caragea,et al.  Context Sensitive Topic Models for Author Influence in Document Networks , 2011, IJCAI.

[29]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[30]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.