Learning to parse from a semantic objective: It works. Is it syntax?

Recent work on reinforcement learning and other gradient estimators for latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time. Surprisingly, these models often perform better at sentence understanding tasks than models that use parse trees from conventional parsers. This paper aims to investigate what these latent tree learning models learn. We replicate two such models in a shared codebase and find that (i) they do outperform baselines on sentence classification, but that (ii) their parsing strategies are not especially consistent across random restarts, (iii) the parses they produce tend to be shallower than PTB parses, and (iv) these do not resemble those of PTB or of any other recognizable semantic or syntactic grammar formalism.

[1]  G. Frege Über Sinn und Bedeutung , 1892 .

[2]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[3]  K. S. Fu,et al.  Introduction to Syntactic Pattern Recognition , 1977 .

[4]  I. Sag Linguistic Theory and Natural Language Processing , 1991 .

[5]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[6]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[8]  Irene Heim,et al.  Semantics in generative grammar , 1998 .

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  David Adger,et al.  Core Syntax: A Minimalist Approach , 2003 .

[12]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[13]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[14]  Edward P. Stabler,et al.  An Introduction to Syntactic Analysis and Theory , 2013 .

[15]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Mark Dredze,et al.  Low-Resource Semantic Role Labeling , 2014, ACL.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[22]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[23]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[24]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[25]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[26]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[27]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[28]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[29]  Jihun Choi,et al.  Unsupervised Learning of Task-Specific Tree Structures with Tree-LSTMs , 2017, ArXiv.

[30]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[31]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations , 2017, ArXiv.

[32]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[33]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[34]  Hong Yu,et al.  Neural Tree Indexers for Text Understanding , 2016, EACL.

[35]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[36]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[37]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[38]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[39]  Stephen Clark,et al.  Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs , 2017, Natural Language Engineering.