TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence – both semantic and syntactic – but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis and report a new state-of-the-art error rate on the IMDB movie review dataset that amounts to a 13.3% improvement over the previous best result. Finally TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  R. Charach The Appendix , 1948, The Lancet.

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[9]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[10]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[11]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[12]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[13]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[14]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[15]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[16]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[18]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[19]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[20]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[21]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[22]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[27]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[28]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[29]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[30]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Chris Dyer,et al.  Document Context Language Models , 2015, ICLR 2015.

[33]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[34]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016 .

[35]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[36]  Larry P. Heck,et al.  Contextual LSTM (CLSTM) models for Large scale NLP tasks , 2016, ArXiv.

[37]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[38]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[39]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[40]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .