Colorless Green Recurrent Networks Dream Hierarchically

Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.

[1]  F. Hargreave Case , 1967 .

[2]  Ivan A. Sag,et al.  On parasitic gaps , 1983 .

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  K. Bock,et al.  Broken agreement , 1991, Cognitive Psychology.

[5]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[6]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  David C. Plaut,et al.  Simple Recurrent Networks and Natural Language: How Important is Starting Small? , 1997 .

[9]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[10]  Nick Chater,et al.  Toward a connectionist model of recursion in human linguistic performance , 1999, Cogn. Sci..

[11]  Fernando C Pereira Formal grammar and information theory: together again? , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[12]  A. Friederici,et al.  Auditory Language Comprehension: An Event-Related fMRI Study on the Processing of Syntactic and Lexical Information , 2000, Brain and Language.

[13]  D. Perani,et al.  Syntax and the Brain: Disentangling Grammar by Selective Anomalies , 2001, NeuroImage.

[14]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[15]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[16]  Kathryn Bock,et al.  Agreement and attraction in Russian , 2008 .

[17]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[18]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[19]  Bo Cartling,et al.  On the implicit acquisition of a context-free grammar by a simple recurrent neural network , 2008, Neurocomputing.

[20]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[21]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[22]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[23]  A. Goldberg,et al.  Evidence for automatic accessing of constructional meaning: Jabberwocky sentences prime associated verbs , 2013 .

[24]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[25]  Noam Chomsky,et al.  Structures, Not Strings: Linguistics as Part of the Cognitive Sciences , 2015, Trends in Cognitive Sciences.

[26]  James Cross,et al.  Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[27]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[28]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[29]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[30]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[31]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[32]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[33]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[34]  Mirella Lapata,et al.  Dependency Parsing as Head Selection , 2016, EACL.

[35]  Samy Bengio,et al.  N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.

[36]  Yoav Goldberg,et al.  Exploring the Syntactic Abilities of RNNs with Multi-task Learning , 2017, CoNLL.

[37]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[38]  SHALOM LAPPIN,et al.  Using Deep Neural Networks to Learn Syntactic Agreement , 2017, LILT.

[39]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[40]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[41]  Joe Pater Generative linguistics and neural networks at 60: Foundation, friction, and fusion , 2019, Language.