Grammar-based Neural Text-to-SQL Generation

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar. Grammar-based decoding has shown significant improvements for other semantic parsing tasks, but SQL and other general programming languages have complexities not present in logical formalisms that make writing hierarchical grammars difficult. We introduce techniques to handle these complexities, showing how to construct a schema-dependent grammar with minimal over-generation. We analyze these techniques on ATIS and Spider, two challenging text-to-SQL datasets, demonstrating that they yield 14--18\% relative reductions in error.

[1]  Chen Liang,et al.  Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[2]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[5]  Dragomir R. Radev,et al.  Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[6]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[7]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[8]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[9]  Yoav Artzi,et al.  Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[10]  Alvin Cheung,et al.  Mapping Language to Code in Programmatic Context , 2018, EMNLP.

[11]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[12]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[13]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[14]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[15]  Jonathan Berant,et al.  Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing , 2018, EMNLP.

[16]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[17]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[18]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[19]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[20]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[21]  Tao Yu,et al.  SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task , 2018, EMNLP.

[22]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[23]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[28]  Jayant Krishnamurthy,et al.  Neural Semantic Parsing with Type Constraints for Semi-Structured Tables , 2017, EMNLP.

[29]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[30]  Luke S. Zettlemoyer,et al.  Context-dependent Semantic Parsing for Time Expressions , 2014, ACL.

[31]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[32]  Graham Neubig,et al.  TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation , 2018, EMNLP.

[33]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..