Can RNNs learn Recursive Nested Subject-Verb Agreements?

One of the fundamental principles of contemporary linguistics states that language processing requires the ability to extract recursively nested tree structures. However, it remains unclear whether and how this code could be implemented in neural circuits. Recent advances in Recurrent Neural Networks (RNNs), which achieve near-human performance in some language tasks, provide a compelling model to address such questions. Here, we present a new framework to study recursive processing in RNNs, using subject-verb agreement as a probe into the representations of the neural network. We trained six distinct types of RNNs on a simplified probabilistic context-free grammar designed to independently manipulate the length of a sentence and the depth of its syntactic tree. All RNNs generalized to subject-verb dependencies longer than those seen during training. However, none systematically generalized to deeper tree structures, even those with a structural bias towards learning nested tree (i.e., stack-RNNs). In addition, our analyses revealed primacy and recency effects in the generalization patterns of LSTM-based models, showing that these models tend to perform well on the outerand innermost parts of a center-embedded tree structure, but poorly on its middle levels. Finally, probing the internal states of the model during the processing of sentences with nested tree structures, we found a complex encoding of grammatical agreement information (e.g. grammatical number), in which all the information for multiple words nouns was carried by a single unit. Taken together, these results indicate how neural networks may extract bounded nested tree structures, without learning a systematic recursive rule. These authors contributed equally to this work. ar X iv :2 10 1. 02 25 8v 1 [ cs .C L ] 6 J an 2 02 1 A PREPRINT JANUARY 8, 2021

[1]  Marco Baroni,et al.  The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[2]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[3]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[4]  Murdock,et al.  The serial position effect of free recall , 1962 .

[5]  SHALOM LAPPIN,et al.  Using Deep Neural Networks to Learn Syntactic Agreement , 2017, LILT.

[6]  Mathijs Mul,et al.  The compositionality of neural networks: integrating symbolism and connectionism , 2019, ArXiv.

[7]  Florent Meyniel,et al.  The Neural Representation of Sequences: From Transition Probabilities to Algebraic Patterns and Linguistic Trees , 2015, Neuron.

[8]  Dieuwke Hupkes,et al.  Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items , 2018, BlackboxNLP@EMNLP.

[9]  Jean-Philippe Bernardy,et al.  Can Recurrent Neural Networks Learn Nested Recursion? , 2018, LILT.

[10]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[11]  Roger Levy,et al.  What Syntactic Structures block Dependencies in RNN Language Models? , 2019, CogSci.

[12]  Janet Wiles,et al.  Learning to predict a context-free language: analysis of dynamics in recurrent hidden units , 1999 .

[13]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[14]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[15]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[16]  Noam Chomsky,et al.  The Algebraic Theory of Context-Free Languages* , 1963 .

[17]  E. Gibson,et al.  Memory Limitations and Structural Forgetting: The Perception of Complex Ungrammatical Sentences as Grammatical , 1999 .

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[20]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[21]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[24]  Yonatan Belinkov,et al.  On Evaluating the Generalization of LSTM Models in Formal Languages , 2018, ArXiv.

[25]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[26]  Morten H. Christiansen The (Non)Necessity of Recursion in Natural Language Processing , 2002 .

[27]  Nick Chater,et al.  Finite models of infinite language: A connectionist approach to recursion , 2001 .

[28]  Yonatan Belinkov,et al.  Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages , 2019, ArXiv.

[29]  Roger Levy,et al.  RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency , 2018, ArXiv.

[30]  Yonatan Belinkov,et al.  LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[31]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[32]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[33]  Nick Chater,et al.  Toward a connectionist model of recursion in human linguistic performance , 1999, Cogn. Sci..

[34]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.