Compositional generalization in a deep seq2seq model by separating syntax and semantics

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.

[1]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[2]  A. Caramazza,et al.  Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia , 1976, Brain and Language.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[5]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[6]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[7]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[9]  Chuang Gan,et al.  Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[12]  Sharon L. Thompson-Schill,et al.  Dissecting the Language Organ: A New Look at the Role of Broca’s Area in Language Processing , 2017 .

[13]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[14]  Jonathan D. Cohen,et al.  Indirection and symbol-like processing in the prefrontal cortex and basal ganglia , 2013, Proceedings of the National Academy of Sciences.

[15]  Marco Baroni,et al.  CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks , 2019, ACL.

[16]  E. Miller,et al.  The “working” of working memory , 2013, Dialogues in clinical neuroscience.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[19]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[20]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Konrad P. Körding,et al.  Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[22]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[23]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[24]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[25]  Jason Weston,et al.  Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[26]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[27]  Trevor Darrell,et al.  Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).