Predicting Embedded Syntactic Structures from Natural Language Sentences with Neural Network Approaches

Syntactic parsing is a key component of natural language understanding and, traditionally, has a symbolic output. Recently, a new approach for predicting syntactic structures from sentences has emerged: directly producing small and expressive vectors that embed in syntactic structures. In this approach, parsing produces distributed representations. In this paper, we advance the frontier of these novel predictors by using the learning capabilities of neural networks. We propose two approaches for predicting the embedded syntactic structures. The first approach is based on a multi-layer perceptron to learn how to map vectors representing sentences into embedded syntactic structures. The second approach exploits recurrent neural networks with long short-term memory (LSTM-RNN-DRP) to directly map sentences to these embedded structures. We show that both approaches successfully exploit word information to learn syntactic predictors and achieve a significant performance advantage over previous methods. Results on the Penn Treebank corpus are promising. With the LSTM-RNN-DRP, we improve the previous state-of-the-art method by 8.68%.

[1]  W. Marsden I and J , 2012 .

[2]  Fabio Massimo Zanzotto,et al.  Distributed Tree Kernels , 2012, ICML.

[3]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[6]  Graeme Hirst,et al.  Recognizing Textual Entailment , 2012 .

[7]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[8]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[9]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[10]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[11]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[12]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[13]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[16]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[17]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Ellen M Voorhees Question answering in TREC , 2001, CIKM '01.

[19]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[20]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[23]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[24]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[25]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[26]  Christopher D. Manning,et al.  Learning to recognize features of valid textual entailments , 2006, NAACL.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[29]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[30]  Fabio Massimo Zanzotto,et al.  Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to “Parse”? , 2013, CVSM@ACL.

[31]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.