Lexicon Learning for Few Shot Sequence Modeling

Sequence-to-sequence transduction is the core problem in language processing applications as diverse as semantic parsing, machine translation, and instruction following. The neural network models that provide the dominant solution to these problems are brittle, especially in low-resource settings: they fail to generalize correctly or systematically from small datasets. Past work has shown that many failures of systematic generalization arise from neural models’ inability to disentangle lexical phenomena from syntactic ones. To address this, we augment neural decoders with a lexical translation mechanism that generalizes existing copy mechanisms to incorporate learned, decontextualized, token-level translation rules. We describe how to initialize this mechanism using a variety of lexicon learning algorithms, and show that it improves systematic generalization on a diverse set of sequence modeling tasks drawn from cognitive science, formal semantics, and machine translation.1

[1]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[2]  Armando Solar-Lezama,et al.  Learning Compositional Rules via Neural Program Synthesis , 2020, NeurIPS.

[3]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[4]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[5]  Anil Kumar Singh,et al.  Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training , 2009, HLT-NAACL.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  J. Bresnan Lexical-Functional Syntax , 2000 .

[8]  Anoop Sarkar,et al.  Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation , 2019, ArXiv.

[9]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[12]  Jan Niehues,et al.  Towards one-shot learning for rare-word translation with external experts , 2018, NMT@ACL.

[13]  Marc van Zee,et al.  Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.

[14]  Qian Liu,et al.  Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.

[15]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[16]  David Lopez-Paz,et al.  Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.

[17]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[20]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[21]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Jacob Andreas,et al.  Learning to Recombine and Resample Data for Compositional Generalization , 2020, ICLR.

[24]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[25]  Brenden M. Lake,et al.  Mutual exclusivity as a challenge for deep neural networks , 2019, NeurIPS.

[26]  Marco Baroni,et al.  Human few-shot learning of compositional instructions , 2019, CogSci.

[27]  Tristan Thrush,et al.  Compositional Neural Machine Translation by Removing the Lexicon from Syntax , 2020, CogSci.

[28]  Jacob Andreas,et al.  Good-Enough Compositional Data Augmentation , 2019, ACL.

[29]  Tal Linzen,et al.  COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[30]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[31]  Maria Teresa Guasti,et al.  Language acquisition : the growth of grammar , 2002 .

[32]  Katharina Kann,et al.  Making a Point: Pointer-Generator Transformers for Disjoint Vocabularies , 2020, AACL.

[33]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[34]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[35]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[36]  David Chiang,et al.  Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[37]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[38]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[39]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[40]  Mark Steedman,et al.  Lexical Generalization in CCG Grammar Induction for Semantic Parsing , 2011, EMNLP.