Systematic Generalization: What Is Required and Can It Be Learned?

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn inappropriate layouts or parametrizations that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

[1]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.

[2]  P. Smolensky THE CONSTITUENT STRUCTURE OF CONNECTIONIST MENTAL STATES: A REPLY TO FODOR AND PYLYSHYN , 2010 .

[3]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[5]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[6]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[7]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[8]  Wei Wang,et al.  Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering , 2018, ACL.

[9]  Alexander Kuhnle,et al.  ShapeWorld - A new test methodology for multimodal language understanding , 2017, ArXiv.

[10]  Peter Young,et al.  Smart Reply: Automated Response Suggestion for Email , 2016, KDD.

[11]  Justin Johnson,et al.  DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer , 2018, ArXiv.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[14]  Eliana Colunga,et al.  The Statistical Brain: Reply to Marcus’ The Algebraic Mind , 2003 .

[15]  Xinlei Chen,et al.  Pythia v0.1: the Winning Entry to the VQA Challenge 2018 , 2018, ArXiv.

[16]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[20]  Marc Brockschmidt,et al.  Differentiable Programs with Neural Libraries , 2016, ICML.

[21]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Jason Weston,et al.  Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[25]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[26]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[27]  Nitish Gupta,et al.  Neural Compositional Denotational Semantics for Question Answering , 2018, EMNLP.

[28]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[29]  Trevor Darrell,et al.  Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.

[30]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Trevor Darrell,et al.  Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).