A Discrete Variational Recurrent Topic Model without the Reparametrization Trick

We show how to learn a neural topic model with discrete random variables---one that explicitly models each word's assigned topic---using neural variational inference that does not rely on stochastic backpropagation to handle the discrete variables. The model we utilize combines the expressive power of neural methods for representing sequences of text with the topic model's ability to capture global, thematic coherence. Using neural variational inference, we show improved perplexity and document understanding across multiple corpora. We examine the effect of prior parameters both on the model and variational parameters and demonstrate how our approach can compete and surpass a popular topic model implementation on an automatic measure of topic quality.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[3]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[4]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[5]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[6]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[7]  Ramin Ayanzadeh,et al.  A Survey on Compressive Sensing: Classical Results and Recent Advancements , 2019, ArXiv.

[8]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[9]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[10]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[11]  Jordan L. Boyd-Graber,et al.  Automatic Evaluation of Local Topic Quality , 2019, ACL.

[12]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[13]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[14]  Tsung-Hsien Wen,et al.  Latent Topic Conversational Models , 2018, ArXiv.

[15]  Hao Zhang,et al.  WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling , 2018, ICLR.

[16]  Francis Ferraro,et al.  Event Representation with Sequential, Semi-Supervised Discrete Variables , 2020, ArXiv.

[17]  Bowen Zhou,et al.  SenGen: Sentence Generating Neural Variational Topic Model , 2017, ArXiv.

[18]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[19]  Mikhail Khodak,et al.  A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs , 2018, ICLR.

[20]  Khe Chai Sim,et al.  Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[21]  Atsuhiro Takasu,et al.  Supervised Deep Polylingual Topic Modeling for Scholarly Information Recommendations , 2018, ICPRAM.

[22]  Eva Hajicová,et al.  Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study , 2018, LREC.

[23]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[24]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[25]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[26]  Hinrich Schütze,et al.  textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior , 2018, ICLR.

[27]  Guoyin Wang,et al.  Topic-Guided Variational Auto-Encoder for Text Generation , 2019, NAACL.

[28]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[29]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[30]  Shuang-Hong Yang,et al.  Dialect topic modeling for improved consumer medical search. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[31]  Nematollah Batmanghelich,et al.  Nonparametric Spherical Topic Modeling with Word Embeddings , 2016, ACL.

[32]  Christian Bauckhage,et al.  Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants , 2016, Scientific Reports.

[33]  Timothy Baldwin,et al.  Topically Driven Neural Language Model , 2017, ACL.

[34]  Michael J. Paul Topic Modeling with Structured Priors for Text-Driven Science , 2015 .

[35]  Xiaogang Wang,et al.  Action Recognition Using Topic Models , 2011, Visual Analysis of Humans.

[36]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[37]  S. A. Chowdhury,et al.  RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.

[38]  Yu Zhang,et al.  Recurrent Attentional Topic Model , 2017, AAAI.

[39]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[40]  Ani Nenkova,et al.  Detecting (Un)Important Content for Single-Document News Summarization , 2017, EACL.

[41]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[42]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[43]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[44]  Marco Baroni,et al.  The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[45]  Francis Ferraro,et al.  Topic Identification and Discovery on Text and Speech , 2015, EMNLP.

[46]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[47]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[48]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[49]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[50]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.