Learning to Recombine and Resample Data for Compositional Generalization

Flexible neural models outperform grammar- and automaton-based counterparts on a variety of sequence modeling tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data -- particularly to rare or unseen subsequences. Past work has found symbolic scaffolding (e.g. grammars or automata) essential in these settings. Here we present a family of learned data augmentation schemes that support a large category of compositional generalizations without appeal to latent symbolic structure. Our approach to data augmentation has two components: recombination of original training examples via a prototype-based generative model and resampling of generated examples to encourage extrapolation. Training an ordinary neural sequence model on a dataset augmented with recombined and resampled examples significantly improves generalization in two language processing problems---instruction following (SCAN) and morphological analysis (Sigmorphon 2018)---where our approach enables learning of new constructions and tenses from as few as eight initial examples.

[1]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[2]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[3]  Brenden M. Lake,et al.  Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[4]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[5]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[6]  Alexander Clark,et al.  Polynomial Identification in the Limit of Substitutable Context-free Languages , 2005 .

[7]  Deniz Yuret,et al.  Morphological Analysis Using a Sequence Decoder , 2019, Transactions of the Association for Computational Linguistics.

[8]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[9]  Alexander M. Rush,et al.  Posterior Control of Blackbox Generation , 2020, ACL.

[10]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[11]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[12]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  J. Berko The Child's Learning of English Morphology , 1958 .

[14]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[15]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.

[16]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Ryan Cotterell,et al.  UniMorph 3.0: Universal Morphology , 2018, LREC.

[20]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Jason Weston,et al.  Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[25]  Hiroshi Inoue,et al.  Data Augmentation by Pairing Samples for Images Classification , 2018, ArXiv.

[26]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[27]  Percy Liang,et al.  A Retrieve-and-Edit Framework for Predicting Structured Outputs , 2018, NeurIPS.

[28]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[29]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[30]  Julia Deniz Yuret Knet : beginning deep learning with 100 lines of , 2016 .

[31]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[32]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Marco Baroni,et al.  Human few-shot learning of compositional instructions , 2019, CogSci.

[35]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[36]  Yoshua Bengio,et al.  CLOSURE: Assessing Systematic Generalization of CLEVR Models , 2019, ViGIL@NeurIPS.

[37]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[38]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[39]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[40]  Jacob Andreas,et al.  Good-Enough Compositional Data Augmentation , 2019, ACL.

[41]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[42]  Yong Wang,et al.  Search Engine Guided Neural Machine Translation , 2018, AAAI.

[43]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[44]  Steven Piantadosi,et al.  Compositional Reasoning in Early Childhood , 2016, PloS one.

[45]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[46]  Jason Weston,et al.  Retrieve and Refine: Improved Sequence Generation Models For Dialogue , 2018, SCAI@EMNLP.

[47]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[48]  Jeremy Kepner,et al.  Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[49]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[50]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[51]  David Lopez-Paz,et al.  Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.

[52]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[53]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[54]  Ryan Cotterell,et al.  The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection , 2018, CoNLL.

[55]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[56]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[57]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[58]  Claire Gardent,et al.  Sequence-based Structured Prediction for Semantic Parsing , 2016, ACL.