Contextual LSTM (CLSTM) models for Large scale NLP tasks

Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this paper, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate contextual features (e.g., topics) into the model. We evaluate CLSTM on three specific NLP tasks: word prediction, next sentence selection, and sentence topic prediction. Results from experiments run on two corpora, English documents in Wikipedia and a subset of articles from a recent snapshot of English Google News, indicate that using both words and topics as features improves performance of the CLSTM models over baseline LSTM models for these tasks. For example on the next sentence selection task, we get relative accuracy improvements of 21% for the Wikipedia dataset and 18% for the Google News dataset. This clearly demonstrates the significant benefit of using context appropriately in natural language (NL) tasks. This has implications for a wide variety of NL applications like question answering, sentence completion, paraphrase generation, and next utterance prediction in dialog systems.

[1]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[2]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[3]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[6]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[7]  Kevin P. Murphy,et al.  Figure-ground segmentation using a hierarchical conditional random field , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[8]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[9]  Wei-Ying Ma,et al.  Bag-of-Words Based Deep Neural Network for Image Retrieval , 2014, ACM Multimedia.

[10]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[11]  Hwee Tou Ng,et al.  A Generative Model for Parsing Natural Language to Meaning Representations , 2008, EMNLP.

[12]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[13]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[14]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[15]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[16]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[17]  Alberto Testolin,et al.  Modeling language and cognition with deep unsupervised learning: a tutorial overview , 2013, Front. Psychol..

[18]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[19]  Chris Dyer,et al.  Document Context Language Models , 2015, ICLR 2015.

[20]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[23]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[24]  Dorin Comaniciu,et al.  A learning based hierarchical model for vessel segmentation , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[25]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[27]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[28]  SchmidhuberJürgen,et al.  Learning precise timing with lstm recurrent networks , 2003 .

[29]  R. Kurzweil How to Create a Mind: The Secret of Human Thought Revealed , 2012 .

[30]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[31]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[32]  Thomas Dean,et al.  Learning invariant features using inertial priors , 2006, Annals of Mathematics and Artificial Intelligence.

[33]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[34]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[35]  Quoc V. Le,et al.  Document Embedding with Paragraph Vectors , 2015, ArXiv.

[36]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[37]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[38]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[39]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[40]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[41]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[42]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[45]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[47]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[48]  Jürgen Schmidhuber,et al.  Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks , 2007, IJCAI.

[49]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[50]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.