Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention

We describe a span-level supervised attention loss that improves compositional generalization in semantic parsers. Our approach builds on existing losses that encourage attention maps in neural sequence-to-sequence models to imitate the output of classical word alignment algorithms. Where past work has used word-level alignments, we focus on spans; borrowing ideas from phrase-based machine translation, we align subtrees in semantic parses to spans of input sentences, and encourage neural attention mechanisms to mimic these alignments. This method improves the performance of transformers, RNNs, and structured decoders on three benchmarks of compositional generalization.

[1]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[2]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[3]  Liang Zhao,et al.  Compositional Generalization for Primitive Substitutions , 2019, EMNLP.

[4]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[5]  Qian Liu,et al.  Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.

[6]  Marc van Zee,et al.  Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.

[7]  Dragomir R. Radev,et al.  Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[8]  Ming-Wei Chang,et al.  Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing , 2020, ACL.

[9]  Jacob Andreas,et al.  Task-Oriented Dialogue as Dataflow Synthesis , 2020, Transactions of the Association for Computational Linguistics.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Jonathan Berant,et al.  Span-based Semantic Parsing for Compositional Generalization , 2020, ACL.

[12]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[13]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[14]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[15]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[16]  Ivan Titov,et al.  AMR Parsing as Graph Prediction with Latent Alignment , 2018, ACL.

[17]  Mark Steedman,et al.  Lexical Generalization in CCG Grammar Induction for Semantic Parsing , 2011, EMNLP.

[18]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[19]  Jonathan Berant,et al.  Improving Compositional Generalization in Semantic Parsing , 2020, FINDINGS.

[20]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[21]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[22]  Sonal Gupta,et al.  Semantic Parsing for Task Oriented Dialog using Hierarchical Representations , 2018, EMNLP.

[23]  Brenden M. Lake,et al.  Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[24]  Mirella Lapata,et al.  Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs , 2019, EMNLP.

[25]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[26]  Chen Liang,et al.  Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Mark Johnson,et al.  Semantic Parsing with Bayesian Tree Transducers , 2012, ACL.

[29]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[30]  Mirella Lapata,et al.  Meta-Learning for Domain Generalization in Semantic Parsing , 2020, NAACL.

[31]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[32]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[33]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[34]  Jonathan Berant,et al.  Weakly Supervised Semantic Parsing with Abstract Examples , 2017, ACL.

[35]  Ming-Wei Chang,et al.  Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations , 2018, EMNLP.

[36]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[37]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[38]  Jonas Kuhn,et al.  Polyglot Semantic Parsing in APIs , 2018, NAACL.

[39]  Percy Liang,et al.  Learning executable semantic parsers for natural language understanding , 2016, Commun. ACM.

[40]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[41]  Graham Neubig,et al.  A Syntactic Neural Model for General-Purpose Code Generation , 2017, ACL.

[42]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[43]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[44]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .