Structured Reordering for Modeling Latent Alignments in Sequence Transduction

Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing separable permutations. We present an efficient dynamic programming algorithm performing exact marginal and MAP inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i.e., semantic parsing and machine translation).

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[3]  Taro Watanabe,et al.  Inducing a Discriminative Parser to Optimize Machine Translation Reordering , 2012, EMNLP.

[4]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[7]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[8]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lei Yu,et al.  Online Segment to Segment Neural Transduction , 2016, EMNLP.

[10]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[11]  Ming-Wei Chang,et al.  Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? , 2021, ACL/IJCNLP.

[12]  Jean-Philippe Vert,et al.  Differentiable Ranking and Sorting using Optimal Transport , 2019, NeurIPS.

[13]  Ivan Titov,et al.  AMR Parsing as Graph Prediction with Latent Alignment , 2018, ACL.

[14]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[15]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[16]  Armand Joulin,et al.  Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.

[17]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[18]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19]  Alexander M. Rush,et al.  Latent Alignment and Variational Attention , 2018, NeurIPS.

[20]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[21]  Ivan Titov,et al.  Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming , 2019, ACL.

[22]  Masao Utiyama,et al.  Neural Machine Translation with Reordering Embeddings , 2019, ACL.

[23]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[24]  Yansong Feng,et al.  Latent Template Induction with Gumbel-CRFs , 2020, NeurIPS.

[25]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[26]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[27]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[28]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[29]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[30]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[32]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[33]  Mark Johnson,et al.  Semantic Parsing with Bayesian Tree Transducers , 2012, ACL.

[34]  Noam Chomsky,et al.  Aspects of the Theory of Syntax , 1970 .

[35]  Tetsuji Nakagawa Efficient Top-Down BTG Parsing for Machine Translation Preordering , 2015, ACL.

[36]  Marco Baroni,et al.  Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[37]  Stefano Ermon,et al.  Stochastic Optimization of Sorting Networks via Continuous Relaxations , 2019, ICLR.

[38]  Mark Steedman,et al.  Formal Basis of a Language Universal , 2021, CL.

[39]  Rohit J. Kate,et al.  Learning to Transform Natural to Formal Languages , 2005, AAAI.

[40]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[41]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[42]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[43]  Khalil Sima'an,et al.  Reordering Grammar Induction , 2015, EMNLP.

[44]  Mathijs Mul,et al.  The compositionality of neural networks: integrating symbolism and connectionism , 2019, ArXiv.

[45]  Chong Wang,et al.  Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[46]  Chong Wang,et al.  Sequence Modeling via Segmentations , 2017, ICML.

[47]  Scott W. Linderman,et al.  Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[48]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[49]  Jason Katz-Brown,et al.  Syntactic Reordering in Preprocessing for Japanese → English Translation: MIT System Description for NTCIR-7 Patent Translation Task , 2008, NTCIR.

[50]  Dragomir R. Radev,et al.  Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[51]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[52]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[53]  MARK STEEDMAN,et al.  A formal universal of natural language grammar , 2020 .

[54]  Sarah Parisot,et al.  Learning Conditioned Graph Structures for Interpretable Visual Question Answering , 2018, NeurIPS.

[55]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[56]  Prosenjit Bose,et al.  Pattern Matching for Permutations , 1993, WADS.

[57]  Jonathan Berant,et al.  Span-based Semantic Parsing for Compositional Generalization , 2020, ACL.

[58]  Wei Lu,et al.  Semantic Parsing with Neural Hybrid Trees , 2017, AAAI.

[59]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[60]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[61]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.