DBPal: Weak Supervision for Learning a Natural Language Interface to Databases

This paper describes DBPal, a new system to translate natural language utterances into SQL statements using a neural machine translation model. While other recent approaches use neural machine translation to implement a Natural Language Interface to Databases (NLIDB), existing techniques rely on supervised learning with manually curated training data, which results in substantial overhead for supporting each new database schema. In order to avoid this issue, DBPal implements a novel training pipeline based on weak supervision that synthesizes all training data from a given database schema. In our evaluation, we show that DBPal can outperform existing rule-based NLIDBs while achieving comparable performance to other NLIDBs that leverage deep neural network models without relying on manually curated training data for every new database schema.

[1]  Emanuel Zgraggen,et al.  Discrete Time Specifications In Temporal Queries , 2017, CHI Extended Abstracts.

[2]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[3]  Carsten Binnig,et al.  The case for interactive data exploration accelerators (IDEAs) , 2016, HILDA '16.

[4]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[5]  Horacio Rodríguez,et al.  Paraphrase Concept and Typology. A Linguistically Based and Computationally Oriented Approach , 2011, Proces. del Leng. Natural.

[6]  Carsten Binnig,et al.  Vizdom: Interactive Analytics through Pen and Touch , 2015, Proc. VLDB Endow..

[7]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[8]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[9]  Carsten Binnig,et al.  Making the Case for Query-by-Voice with EchoQuery , 2016, SIGMOD Conference.

[10]  Carsten Binnig,et al.  Revisiting Reuse for Approximate Query Processing , 2017, Proc. VLDB Endow..

[11]  Carsten Binnig,et al.  Democratizing Data Science through Interactive Curation of ML Pipelines , 2019, SIGMOD Conference.

[12]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[13]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[14]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[15]  Tim Kraska,et al.  How Progressive Visualizations Affect Exploratory Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[17]  Jignesh M. Patel,et al.  Ava: From Data to Insights Through Conversations , 2017, CIDR.

[18]  Carsten Binnig,et al.  Progressive Data Science: Potential and Challenges , 2018, ArXiv.