Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks

Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it’s seen as key to the human capacity for generalization in language. Recent work (Lake and Baroni, 2018) has studied systematic compositionality in modern seq2seq models using generalization to novel navigation instructions in a grounded environment as a probing tool. Lake and Baroni’s main experiment required the models to quickly bootstrap the meaning of new words. We extend this framework here to settings where the model needs only to recombine well-trained functional words (such as “around” and “right”) in novel contexts. Our findings confirm and strengthen the earlier ones: seq2seq models can be impressively good at generalizing to novel combinations of previously-seen input, but only when they receive extensive training on the specific pattern to be generalized (e.g., generalizing from many examples of “X around right” to “jump around right”), while failing when generalization requires novel application of compositional rules (e.g., inferring the meaning of “around right” from those of “right” and “around”).

[1]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[2]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[3]  I. Leuthäusser Neural network methods , 1991 .

[4]  Morten H. Christiansen,et al.  Generalization and connectionist language learning , 1994 .

[5]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[6]  Steven Phillips,et al.  Are Feedforward and Recurrent Networks Systematic? Analysis and Implications for a Connectionist Cognitive Architecture , 1998, Connect. Sci..

[7]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[8]  Franklin Chang,et al.  Symbolically speaking: a connectionist model of sentence production , 2002, Cogn. Sci..

[9]  Frank van der Velde,et al.  Lack of combinatorial productivity in language processing with simple recurrent networks , 2004, Connect. Sci..

[10]  Philemon Brakel,et al.  Strong systematicity in sentence processing by simple recurrent networks , 2009 .

[11]  Markus F. Damian,et al.  A fundamental limitation of the conjunctive codes learned in PDP models of cognition: comment on Botvinick and Plaut (2006). , 2009, Psychological review.

[12]  Matthew M Botvinick,et al.  Empirical and computational support for context-dependent representations of serial order: reply to Bowers, Damian, and Davis (2009). , 2009, Psychological review.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  Stefan L. Frank,et al.  Getting real about systematicity , 2014 .

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[18]  Andrew K. Lampinen,et al.  One-shot and few-shot learning of word embeddings , 2017, ArXiv.

[19]  Marco Baroni,et al.  High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.

[20]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.