Re-examining the Role of Schema Linking in Text-to-SQL

In existing sophisticated text-to-SQL models, schema linking is often considered as a simple, minor component, belying its importance. By providing a schema linking corpus based on the Spider text-to-SQL dataset, we systematically study the role of schema linking. We also build a simple BERT-based baseline, called Schema-Linking SQL (SLSQL) to perform a data-driven study. We find when schema linking is done well, SLSQL demonstrates good performance on Spider despite its structural simplicity. Many remaining errors are attributable to corpus noise. This suggests schema linking is the crux for the current text-to-SQL task. Our analytic studies provide insights on the characteristics of schema linking for future developments of text-to-SQL tasks.

[1]  Lu Chen,et al.  Towards Universal Dialogue State Tracking , 2018, EMNLP.

[2]  Jonathan Berant,et al.  Global Reasoning over Database Structures for Text-to-SQL Parsing , 2019, EMNLP.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[5]  Tao Yu,et al.  Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions , 2019, EMNLP.

[6]  Jaime G. Carbonell,et al.  Zero-shot Neural Transfer for Cross-lingual Entity Linking , 2018, AAAI.

[7]  Jonathan Berant,et al.  Grammar-based Neural Text-to-SQL Generation , 2019, ArXiv.

[8]  Fabio Petroni,et al.  Zero-shot Entity Linking with Dense Entity Retrieval , 2020, EMNLP.

[9]  Jian-Guang Lou,et al.  Data-Anonymous Encoding for Text-to-SQL Generation , 2019, EMNLP.

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Dong Ryeol Shin,et al.  RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases , 2020, CL.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Xiaodong Liu,et al.  RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers , 2019, ACL.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Yan Gao,et al.  Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation , 2019, ACL.

[16]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[17]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[18]  Ming-Wei Chang,et al.  Zero-Shot Entity Linking by Reading Entity Descriptions , 2019, ACL.

[19]  Qi Hu,et al.  An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking , 2018, ACL.

[20]  Dragomir R. Radev,et al.  Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[21]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[22]  Dan Roth,et al.  Design Challenges in Low-resource Cross-lingual Entity Linking , 2020, EMNLP.

[23]  Jonathan Berant,et al.  Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing , 2018, EMNLP.

[24]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Tao Yu,et al.  SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task , 2018, EMNLP.

[27]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[28]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[29]  Ming Zhou,et al.  Semantic Parsing with Syntax- and Table-Aware SQL Generation , 2018, ACL.

[30]  Ehsan Hosseini-Asl,et al.  Toward Scalable Neural Dialogue State Tracking Model , 2018, ArXiv.

[31]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[32]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[33]  Xifeng Yan,et al.  Cross-domain Semantic Parsing via Paraphrasing , 2017, EMNLP.

[34]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[35]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[36]  Seunghyun Park,et al.  A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization , 2019, ArXiv.

[37]  Dongjun Lee,et al.  Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation , 2019, EMNLP.

[38]  Jonathan Berant,et al.  Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing , 2019, ACL.