LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better

Language exhibits hierarchical structure, but recent work using a subject-verb agreement diagnostic argued that state-of-the-art language models, LSTMs, fail to learn long-range syntax sensitive dependencies. Using the same diagnostic, we show that, in fact, LSTMs do succeed in learning such dependencies—provided they have enough capacity. We then explore whether models that have access to explicit syntactic information learn agreement more effectively, and how the way in which this structural information is incorporated into the model impacts performance. We find that the mere presence of syntactic information does not improve accuracy, but when model architecture is determined by syntax, number agreement is improved. Further, we find that the choice of how syntactic structure is built affects how well number agreement is learned: top-down construction outperforms left-corner and bottom-up variants in capturing non-local structural dependencies.

[1]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[2]  P. Johnson-Laird Mental models , 1989 .

[3]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[4]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[5]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[6]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[7]  Eugene Charniak,et al.  Parsing as Language Modeling , 2016, EMNLP.

[8]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[9]  Bob Carpenter,et al.  Probabilistic Parsing using Left Corner Language Models , 1997, IWPT.

[10]  WILLIAM MARSLEN-WILSON,et al.  Linguistic Structure and Speech Shadowing at Very Short Latencies , 1973, Nature.

[11]  Dan Klein,et al.  Improving Neural Parsing by Disentangling Model Combination and Reranking Effects , 2017, ACL.

[12]  William Schuler,et al.  Broad-Coverage Parsing Using Human-Like Memory Constraints , 2010, CL.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[15]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[16]  John Hale,et al.  Automaton Theories of Human Sentence Comprehension , 2014 .

[17]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[18]  Stanislas Dehaene,et al.  Neurophysiological dynamics of phrase-structure building during sentence processing , 2017, Proceedings of the National Academy of Sciences.

[19]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[20]  Philip Resnik,et al.  Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[21]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[22]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[23]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[24]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[25]  Mark Steedman,et al.  Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[26]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Mark Johnson,et al.  Memory requirements and local ambiguities of parsing strategies , 1991 .

[29]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[32]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[33]  Eugene Charniak,et al.  Top-Down Nearly-Context-Sensitive Parsing , 2010, EMNLP.

[34]  Stephen Pulman,et al.  Grammars, parsers, and memory limitations , 1986 .

[35]  Emmanuel Dupoux,et al.  Comparing Character-level Neural Language Models Using a Lexical Decision Task , 2017, EACL.