textTOvec: Deep Contextualized Neural Autoregressive Models of Language with Distributed Compositional Prior

We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a "bag-of-word" and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.

[1]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[2]  Hinrich Schütze,et al.  Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time , 2017, NAACL.

[3]  Timothy Baldwin,et al.  Topically Driven Neural Language Model , 2017, ACL.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[11]  David Newman,et al.  External evaluation of topic models , 2009 .

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[14]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[15]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[16]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[17]  Hugo Larochelle,et al.  A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[19]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[20]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[21]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.

[22]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[23]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[24]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[25]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Hinrich Schütze,et al.  Document Informed Neural Autoregressive Topic Models with Distributional Prior , 2018, AAAI.

[28]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[29]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[30]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[31]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[32]  Noah A. Smith,et al.  A Neural Framework for Generalized Topic Models , 2017, ArXiv.

[33]  Hugo Larochelle,et al.  Document Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[34]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.