Learning compositionally through attentive guidance

While neural network models have been successfully applied to domains that require substantial generalisation skills, recent studies have implied that they struggle when solving the task they are trained on requires inferring its underlying compositional structure. In this paper, we introduce Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find more compositional solutions. We test it on two tasks, devised precisely to assess the compositional capabilities of neural models, and we show that vanilla sequence to sequence models with attention overfit the training distribution, while the guided versions come up with compositional solutions that fit the training and testing distributions almost equally well. Moreover, the learned solutions generalise even in cases where the training and testing distributions strongly diverge. In this way, we demonstrate that sequence to sequence models are capable of finding compositional solutions without requiring extra components. These results helps to disentangle the causes for the lack of systematic compositionality in neural networks, which can in turn fuel future work.

[1]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[2]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jianfeng Dong,et al.  Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[4]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[5]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[6]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[7]  Joshua B. Tenenbaum,et al.  Probing the Compositionality of Intuitive Functions , 2016, NIPS.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[10]  Niranjan Balasubramanian,et al.  The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models , 2018, ArXiv.

[11]  Zhanxing Zhu,et al.  Neural Information Processing Systems (NIPS) , 2015 .

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[15]  Tamar Flash,et al.  Motor primitives in vertebrates and invertebrates , 2005, Current Opinion in Neurobiology.

[16]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17]  Marco Baroni,et al.  Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[18]  Marco Baroni,et al.  Memorize or generalize? Searching for a compositional RNN in a haystack , 2018, ArXiv.

[19]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[20]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[25]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[26]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[29]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[30]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[31]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[32]  Marcus Hutter,et al.  Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[33]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[34]  Chen Sun,et al.  VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[36]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.