论文信息 - Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

Christopher D. Manning | Richard Socher | Kai Sheng Tai | R. Socher

[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[2] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[3] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[4] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[6] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7] Peter W. Foltz,et al. The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[8] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11] Mirella Lapata,et al. Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[12] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[13] Claire Cardie,et al. Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[14] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[15] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[18] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[19] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[20] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[21] Nitish Srivastava,et al. Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[22] Nitish Srivastava,et al. Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[23] Mehrnoosh Sadrzadeh,et al. Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[24] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[25] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.

[27] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[28] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[29] M. Marelli,et al. SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[30] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[31] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[32] Alexander F. Gelbukh,et al. UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment , 2014, *SEMEVAL.

[33] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34] Claire Cardie,et al. Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[35] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[36] Malvina Nissim,et al. The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[37] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[38] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[40] Alice Lai,et al. Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[41] Man Lan,et al. ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[42] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.