Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents

Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, it is still a challenge task to model long texts, such as sentences and documents. In this paper, we propose a multi-timescale long short-termmemory (MT-LSTM) neural network to model long texts. MTLSTM partitions the hidden states of the standard LSTM into several groups. Each group is activated at different time periods. Thus, MT-LSTM can model very long documents as well as short sentences. Experiments on four benchmark datasets show that our model outperforms the other neural models in text classification task.

[1]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[4]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[7]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[8]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[9]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[10]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[14]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[17]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[20]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[24]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[25]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[26]  Misha Denil,et al.  Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network , 2014, ArXiv.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[31]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[35]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.