Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[3]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[4]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[8]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[12]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[13]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[14]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[18]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[19]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[20]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[21]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[22]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[23]  Mehrnoosh Sadrzadeh,et al.  Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[24]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[27]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[28]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[29]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[30]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[31]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[32]  Alexander F. Gelbukh,et al.  UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment , 2014, *SEMEVAL.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[35]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[36]  Malvina Nissim,et al.  The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[37]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[40]  Alice Lai,et al.  Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[41]  Man Lan,et al.  ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[42]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.