Deep topic modeling by multilayer bootstrap network and lasso

Topic modeling is widely studied for the dimension reduction and analysis of documents. However, it is formulated as a difficult optimization problem. Current approximate solutions also suffer from inaccurate model- or data-assumptions. To deal with the above problems, we propose a polynomial-time deep topic model with no model and data assumptions. Specifically, we first apply multilayer bootstrap network (MBN), which is an unsupervised deep model, to reduce the dimension of documents, and then use the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery. To our knowledge, this is the first time that MBN and Lasso are applied to unsupervised topic modeling. Experimental comparison results with five representative topic models on the 20-newsgroups and TDT2 corpora illustrate the effectiveness of the proposed algorithm.

[1]  Xiao-Lei Zhang,et al.  Multilayer bootstrap networks , 2014, Neural Networks.

[2]  J. Kalbfleisch,et al.  Between- and within-cluster covariate effects in the analysis of clustered data. , 1998, Biometrics.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[7]  Naonori Ueda,et al.  Unsupervised Cluster Matching via Probabilistic Latent Variable Models , 2013, AAAI.

[8]  Jesús Bobadilla,et al.  Recommender Systems Clustering Using Bayesian Non Negative Matrix Factorization , 2018, IEEE Access.

[9]  Naonori Ueda,et al.  Topic Models for Unsupervised Cluster Matching , 2018, IEEE Transactions on Knowledge and Data Engineering.

[10]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[11]  Weifeng Li,et al.  Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications , 2018, IEEE Transactions on Knowledge and Data Engineering.

[12]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[13]  Guangquan Zhang,et al.  Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  John Shawe-Taylor,et al.  Localized Lasso for High-Dimensional Regression , 2016, AISTATS.

[16]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Mingyi Hong,et al.  Anchor-Free Correlated Topic Modeling , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  Zhe Gan,et al.  Deep Poisson Factor Modeling , 2015, NIPS.

[20]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[23]  Jaap Kamps,et al.  HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of Documents , 2018, IEEE Transactions on Knowledge and Data Engineering.

[24]  Nicolas Gillis,et al.  Successive Nonnegative Projection Algorithm for Robust Nonnegative Blind Source Separation , 2013, SIAM J. Imaging Sci..

[25]  Nikos D. Sidiropoulos,et al.  Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm , 2016, NIPS.