Bayesian Sparse Topical Coding

Sparse topic models (STMs) are widely used for learning a semantically rich latent sparse representation of short texts in large scale, mainly by imposing sparse priors or appropriate regularizers on topic models. However, it is difficult for these STMs to model the sparse structure and pattern of the corpora accurately, since their sparse priors always fail to achieve real sparseness, and their regularizers bypass the prior information of the relevance between sparse coefficients. In this paper, we propose a novel Bayesian hierarchical topic models called Bayesian Sparse Topical Coding with Poisson Distribution (BSTC-P) on the basis of Sparse Topical Coding with Sparse Groups (STCSG). Different from traditional STMs, it focuses on imposing hierarchical sparse prior to leverage the prior information of relevance between sparse coefficients. Furthermore, we propose a sparsity-enhanced BSTC, Bayesian Sparse Topical Coding with Normal Distribution (BSTC-N), via mathematic approximation. We adopt superior hierarchical sparse inducing prior, with the purpose of achieving the sparsest optimal solution. Experimental results on datasets of Newsgroups and Twitter show that both BSTC-P and BSTC-N have better performance on finding clear latent semantic representations. Therefore, they yield better performance than existing works on document classification tasks.

[1]  Jennifer G. Dy,et al.  Sparse Probabilistic Principal Component Analysis , 2009, AISTATS.

[2]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[3]  Nasir Ghani,et al.  Micro-Blogger Influence Analysis Based on User Features , 2013 .

[4]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[5]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[6]  Xueqi Cheng,et al.  Group sparse topical coding: from code to topic , 2013, WSDM.

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[9]  Jun Fang,et al.  Bayesian Compressive Sensing Using Normal Product Priors , 2015, IEEE Signal Processing Letters.

[10]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[11]  Min Peng,et al.  Sparse Topical Coding with Sparse Groups , 2016, WAIM.

[12]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[13]  Ge Yu,et al.  The Moving K Diversified Nearest Neighbor Query , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[14]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[15]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Christoph Schnörr,et al.  Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming , 2006, J. Mach. Learn. Res..

[19]  Xu Chen,et al.  The contextual focused topic model , 2012, KDD.

[20]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[22]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[23]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[24]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[25]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[26]  Jen-Tzung Chien,et al.  Bayesian Sparse Topic Model , 2013, Journal of Signal Processing Systems.

[27]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[28]  Yanchun Zhang,et al.  Effectively Finding Relevant Web Pages from Linkage Information , 2003, IEEE Trans. Knowl. Data Eng..

[29]  Rong Jin,et al.  Topic Modeling in Semantic Space with Keywords , 2015, CIKM.

[30]  Guillaume Bouchard,et al.  Latent IBP Compound Dirichlet Allocation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[32]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[33]  Tu Bao Ho,et al.  Fully Sparse Topic Models , 2012, ECML/PKDD.

[34]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[35]  Min Peng,et al.  High quality information extraction and query-oriented summarization for automatic query-reply in social network , 2016, Expert Syst. Appl..

[36]  Eric P. Xing,et al.  Sparse Topical Coding , 2011, UAI.

[37]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[40]  Tomohiro Yoshikawa,et al.  Online topic model for Twitter considering dynamics of user interests and topic trends , 2014, EMNLP.

[41]  Aggelos K. Katsaggelos,et al.  Bayesian Compressive Sensing Using Laplace Priors , 2010, IEEE Transactions on Image Processing.

[42]  Ge Yu,et al.  The Moving K Diversified Nearest Neighbor Query , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[43]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[44]  Chong Wang,et al.  The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling , 2010, ICML.

[45]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.

[46]  Mikkel N. Schmidt,et al.  Nonnegative Matrix Factorization with Gaussian Process Priors , 2008, Comput. Intell. Neurosci..

[47]  Qiaozhu Mei,et al.  Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis , 2014, ICML.

[48]  D. Blei,et al.  Focused Topic Models , 2009 .

[49]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[50]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[51]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[52]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[53]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[54]  Michael I. Jordan,et al.  Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference , 2014, Handbook of Mixed Membership Models and Their Applications.

[55]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.