Neural Approaches for Natural Language Interfaces to Databases: A Survey

A natural language interface to databases (NLIDB) enables users without technical expertise to easily access information from relational databases. Interest in NLIDBs has resurged in the past years due to the availability of large datasets and improvements to neural sequence-to-sequence models. In this survey we focus on the key design decisions behind current state of the art neural approaches, which we group into encoder and decoder improvements. We highlight the three most important directions, namely linking question tokens to database schema elements (schema linking), better architectures for encoding the textual query taking into account the schema (schema encoding), and improved generation of structured queries using autoregressive neural models (grammar-based decoders). To foster future research, we also present an overview of the most important NLIDB datasets, together with a comparison of the top performing neural models and a short insight into recent non deep learning solutions.

[1]  Sébastien Ferré,et al.  Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language , 2016, Semantic Web.

[2]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[3]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[4]  Jens Lehmann,et al.  QaldGen: Towards Microbenchmarking of Question Answering Systems over Knowledge Graphs , 2019, SEMWEB.

[5]  Yoav Artzi,et al.  Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[6]  Eneko Agirre,et al.  A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation , 2020, ACL.

[7]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[8]  Amol Kelkar,et al.  Bertrand-DR: Improving Text-to-SQL using a Discriminative Re-ranker , 2020, ArXiv.

[9]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[10]  Abraham Bernstein,et al.  A comparative survey of recent natural language interfaces for databases , 2019, The VLDB Journal.

[11]  Luyao Chen,et al.  CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases , 2019, EMNLP.

[12]  Baivab Sinha,et al.  Natural Language Question/Answering with User Interaction over a Knowledge Base , 2019, Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science.

[13]  Traian Rebedea,et al.  Natural Language Interface for Databases Using a Dual-Encoder Model , 2018, COLING.

[14]  Weizhu Chen,et al.  IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles , 2018, ArXiv.

[15]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[16]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[17]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Meina Song,et al.  Hierarchical Schema Representation for Text-to-SQL Parsing With Decomposing Decoding , 2019, IEEE Access.

[21]  Luke S. Zettlemoyer,et al.  Context-dependent Semantic Parsing for Time Expressions , 2014, ACL.

[22]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[23]  Xifeng Yan,et al.  DialSQL: Dialogue Based Structured Query Generation , 2018, ACL.

[24]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[25]  Graham Neubig,et al.  Reranking for Neural Semantic Parsing , 2019, ACL.

[26]  Alvin Cheung,et al.  Learning Programmatic Idioms for Scalable Semantic Parsing , 2019, EMNLP.

[27]  Geoffrey B. Boullanger,et al.  Search Like a Human : Neural Machine Translation for Database Search , 2019 .

[28]  Rajarshi Das,et al.  A Survey on Semantic Parsing , 2018, AKBC.

[29]  Dongjun Lee,et al.  One-Shot Learning for Text-to-SQL Generation , 2019, ArXiv.

[30]  Traian Rebedea,et al.  Dataset for a Neural Natural Language Interface for Databases (NNLIDB) , 2017, IJCNLP.

[31]  Kaushik Chakrabarti,et al.  X-SQL: reinforce schema representation with context , 2019, ArXiv.

[32]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[33]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[34]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[35]  Xiaocheng Feng,et al.  Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning , 2019, AAAI.

[36]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37]  Tong Guo,et al.  Content Enhanced BERT-based Text-to-SQL Generation , 2019, ArXiv.

[38]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[39]  Tao Yu,et al.  Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions , 2019, EMNLP.

[40]  Xiaodong Liu,et al.  RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers , 2019, ACL.

[41]  Seunghyun Park,et al.  A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization , 2019, ArXiv.

[42]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[43]  Dragomir R. Radev,et al.  Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[44]  Chenglong Wang,et al.  Pointing Out SQL Queries From Text , 2018 .

[45]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[46]  Hai Ye,et al.  Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization , 2019, ACL.

[47]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[48]  Oleksandr Polozov,et al.  Program Synthesis and Semantic Parsing with Learned Code Idioms , 2019, NeurIPS.

[49]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[50]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[51]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[52]  Po-Sen Huang,et al.  Execution-Guided Neural Program Decoding , 2018, ArXiv.

[53]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[54]  Tao Yu,et al.  SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task , 2018, EMNLP.

[55]  Carsten Binnig,et al.  DBPal: A Learned NL-Interface for Databases , 2018, SIGMOD Conference.

[56]  Jian-Guang Lou,et al.  Data-Anonymous Encoding for Text-to-SQL Generation , 2019, EMNLP.

[57]  Souvik Kundu,et al.  Hybrid Ranking Network for Text-to-SQL , 2020, ArXiv.

[58]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[59]  Ruixiao Sun,et al.  Transferable Natural Language Interface to Structured Queries Aided by Adversarial Generation , 2018, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[60]  Philip Massey,et al.  Generating Logical Forms from Graph Representations of Text and Entities , 2019, ACL.

[61]  Ming Zhou,et al.  Question Generation from SQL Queries Improves Neural Semantic Parsing , 2018, EMNLP.

[62]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[63]  Jonathan Berant,et al.  Grammar-based Neural Text-to-SQL Generation , 2019, ArXiv.

[64]  Dong Ryeol Shin,et al.  RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases , 2020, CL.

[65]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[66]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[67]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[68]  Gang Chen,et al.  Database Meets Deep Learning: Challenges and Opportunities , 2016, SGMD.

[69]  Jonathan Berant,et al.  Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing , 2019, ACL.

[70]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[71]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[72]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[73]  Haixun Wang,et al.  A Transfer-Learnable Natural Language Interface for Databases , 2018, ArXiv.

[74]  Kaylin Hagopian,et al.  Learning Logical Representations from Natural Languages with Weak Supervision and Back-Translation , 2019 .

[75]  Tao Yu,et al.  SParC: Cross-Domain Semantic Parsing in Context , 2019, ACL.

[76]  Yan Gao,et al.  Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation , 2019, ACL.

[77]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[78]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[79]  Graham Neubig,et al.  Merging Weak and Active Supervision for Semantic Parsing , 2019, AAAI.

[80]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[81]  Bowen Zhou,et al.  Zero-shot Text-to-SQL Learning with Auxiliary Task , 2019, AAAI.

[82]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[83]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[84]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[85]  Chen Liang,et al.  Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing , 2018, NeurIPS.