Linguistic generalization and compositionality in modern artificial neural networks

In the last decade, deep artificial neural networks have achieved astounding performance in many natural language-processing tasks. Given the high productivity of language, these models must possess effective generalization abilities. It is widely assumed that humans handle linguistic productivity by means of algebraic compositional rules: are deep networks similarly compositional? After reviewing the main innovations characterizing current deep language-processing networks, I discuss a set of studies suggesting that deep networks are capable of subtle grammar-dependent generalizations, but also that they do not rely on systematic compositional rules. I argue that the intriguing behaviour of these devices (still awaiting a full understanding) should be of interest to linguists and cognitive scientists, as it offers a new perspective on possible computational strategies to deal with linguistic productivity beyond rule-based compositionality, and it might lead to new insights into the less systematic generalization patterns that also appear in natural language. This article is part of the theme issue ‘Towards mechanistic models of meaning composition’.

[1]  G. Frege Über Sinn und Bedeutung , 1892 .

[2]  C. F. Hockett The origin of speech. , 1960, Scientific American.

[3]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[4]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[7]  Paul Smolensky,et al.  Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1990, Artif. Intell..

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  M. McCloskey Networks and Theories: The Place of Connectionism in Cognitive Science , 1991 .

[10]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[11]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[12]  Morten H. Christiansen,et al.  Generalization and connectionist language learning , 1994 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[15]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[16]  Steven Phillips,et al.  Are Feedforward and Recurrent Networks Systematic? Analysis and Implications for a Connectionist Cognitive Architecture , 1998, Connect. Sci..

[17]  Ernest Lepore,et al.  The compositionality papers , 2002 .

[18]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[19]  S. Pinker,et al.  The past and future of the past tense , 2002, Trends in Cognitive Sciences.

[20]  B. Hayes,et al.  Rules vs. analogy in English past tenses: a computational/experimental study , 2003, Cognition.

[21]  Frank van der Velde,et al.  Lack of combinatorial productivity in language processing with simple recurrent networks , 2004, Connect. Sci..

[22]  A. Goldberg,et al.  The English Resultative as a Family of Constructions , 2004 .

[23]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[24]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[25]  M. Tanenhaus,et al.  Acquiring and processing verb argument structure: Distributional learning in a miniature language , 2008, Cognitive Psychology.

[26]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[27]  Philemon Brakel,et al.  Strong systematicity in sentence processing by simple recurrent networks , 2009 .

[28]  K. Aaron Smith,et al.  Grammaticalization , 2011, Lang. Linguistics Compass.

[29]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[30]  M. Pickering,et al.  An integrated theory of language production and comprehension. , 2013, The Behavioral and brain sciences.

[31]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[32]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[33]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[34]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[35]  M. Marelli,et al.  Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. , 2015, Psychological review.

[36]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  Philippe Schlenker,et al.  Formal monkey linguistics , 2016 .

[39]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[40]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[43]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[44]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[47]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[48]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[49]  A. Kuncoro The Perils Of Natural Behaviour Tests For Unnatural Models: The Case Of Number Agreement , 2018 .

[50]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[51]  Jason Weston,et al.  Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[52]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[53]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[54]  Roger Levy,et al.  What do RNN Language Models Learn about Filler–Gap Dependencies? , 2018, BlackboxNLP@EMNLP.

[55]  S. A. Chowdhury,et al.  RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.

[56]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[57]  Tal Linzen,et al.  Distinct patterns of syntactic agreement errors in recurrent networks and humans , 2018, CogSci.

[58]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[59]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[60]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[61]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[62]  K. Zuberbühler,et al.  Compositionality in animals and humans , 2018, PLoS biology.

[63]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[64]  Andrea E. Martin,et al.  Phase synchronization varies systematically with linguistic structure composition , 2019, Philosophical Transactions of the Royal Society B.

[65]  Marco Baroni,et al.  CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks , 2019, ACL.

[66]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[67]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[68]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[69]  Marco Baroni,et al.  The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[70]  I. H. Fichte,et al.  Zeitschrift für Philosophie und philosophische Kritik , 2022 .