Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing

Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).

[1]  Patricia J. Riddle,et al.  Interpretable AMR-Based Question Decomposition for Multi-hop Question Answering , 2022, IJCAI.

[2]  Benjamin Van Durme,et al.  Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation , 2022, FINDINGS.

[3]  Haoming Jiang,et al.  SeqZero: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models , 2022, NAACL-HLT.

[4]  Claire Cardie,et al.  Compositional Task-Oriented Parsing as Abstractive Question Answering , 2022, NAACL.

[5]  Konstantine Arkoudas,et al.  Training Naturalized Semantic Parsers with Very Little Data , 2022, IJCAI.

[6]  Benjamin Van Durme,et al.  Few-Shot Semantic Parsing with Language Models Trained on Code , 2021, NAACL.

[7]  Gaurav Singh Tomar,et al.  CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning , 2021, EMNLP.

[8]  Jonathan Berant,et al.  Weakly Supervised Text-to-SQL Parsing through Question Decomposition , 2021, NAACL-HLT.

[9]  Yanshuai Cao,et al.  Hierarchical Neural Data Synthesis for Semantic Parsing , 2021, ArXiv.

[10]  Yonghong Yan,et al.  Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable , 2021, EMNLP.

[11]  Harm de Vries,et al.  The Power of Prompt Tuning for Low-Resource Semantic Parsing , 2021, ACL.

[12]  Yuan Zhang,et al.  Controllable Semantic Parsing via Retrieval Augmentation , 2021, EMNLP.

[13]  Vivek Gupta,et al.  RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing , 2021, NLP4CONVAI.

[14]  A. Osokin,et al.  SPARQLing Database Queries from Intermediate Question Decompositions , 2021, EMNLP.

[15]  Jonathan Herzig,et al.  Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization , 2021, EMNLP.

[16]  Yuancheng Tu,et al.  Meta Self-training for Few-shot Neural Sequence Labeling , 2021, KDD.

[17]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[18]  Rebecca J. Passonneau,et al.  ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences , 2021, ACL.

[19]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[20]  Fan Yang,et al.  From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding , 2021, ACL.

[21]  Emilio Monti,et al.  Multilingual Neural Semantic Parsing for Low-Resourced Languages , 2021, STARSEM.

[22]  Pascale Fung,et al.  X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing , 2021, REPL4NLP.

[23]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[24]  Dan Klein,et al.  Constrained Language Models Yield Few-Shot Semantic Parsers , 2021, EMNLP.

[25]  Mirella Lapata,et al.  Zero-Shot Cross-lingual Semantic Parsing , 2021, ACL.

[26]  Shuo Huang,et al.  Few-Shot Semantic Parsing for New Predicates , 2021, EACL.

[27]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[28]  Brian M. Sadler,et al.  Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , 2020, WWW.

[29]  Nitish Shirish Keskar,et al.  Unsupervised Paraphrasing with Pretrained Language Models , 2020, EMNLP.

[30]  Mirella Lapata,et al.  Meta-Learning for Domain Generalization in Semantic Parsing , 2020, NAACL.

[31]  Monica S. Lam,et al.  AutoQA: From Databases to Q&A Semantic Parsers with Only Synthetic Training Data , 2020, EMNLP.

[32]  Guodong Zhou,et al.  Improving AMR Parsing with Sequence-to-Sequence Pre-training , 2020, EMNLP.

[33]  Dragomir R. Radev,et al.  GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing , 2020, ICLR.

[34]  Sida I. Wang,et al.  Grounded Adaptation for Zero-shot Executable Semantic Parsing , 2020, EMNLP.

[35]  Haoran Li,et al.  MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark , 2020, EACL.

[36]  Liangming Pan,et al.  KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base , 2020, ACL.

[37]  Carsten Binnig,et al.  DBPal: A Fully Pluggable NL2SQL Training Pipeline , 2020, SIGMOD Conference.

[38]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[39]  Kai Yu,et al.  Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing , 2020, ACL.

[40]  Jacob Andreas,et al.  Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data , 2020, ArXiv.

[41]  Kyunghyun Cho,et al.  Unsupervised Question Decomposition for Question Answering , 2020, EMNLP.

[42]  Daniel Deutch,et al.  Break It Down: A Question Understanding Benchmark , 2020, TACL.

[43]  M. Lam,et al.  Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web , 2020, CIKM.

[44]  Matt Post,et al.  Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering , 2019, CoNLL.

[45]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[46]  Xiaocheng Feng,et al.  Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning , 2019, AAAI.

[47]  Roi Reichart,et al.  Zero-Shot Semantic Parsing for Instructions , 2019, ACL.

[48]  Sekhar Tatikonda,et al.  Zero-shot Transfer Learning for Semantic Parsing , 2018, ArXiv.

[49]  Mo Yu,et al.  Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model , 2018, EMNLP.

[50]  Jonathan Berant,et al.  Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing , 2018, EMNLP.

[51]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[52]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[53]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[54]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[55]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[56]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[57]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[58]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[59]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[60]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[61]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[62]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[63]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[64]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[65]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[66]  Kewei Tu,et al.  Neuralizing Regular Expressions for Slot Filling , 2021, EMNLP.

[67]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[68]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.