Duoquest: A Dual-Specification System for Expressive SQL Queries

Querying a relational database is difficult because it requires users to be familiar with both the SQL language and the schema. However, many users possess enough domain expertise to describe their desired queries by alternative means. For such users, two major alternatives to writing SQL are natural language interfaces (NLIs) and programming-by-example (PBE). Both of these alternatives face certain pitfalls: natural language queries (NLQs) are often ambiguous, even for human interpreters, while current PBE approaches limit functionality to be tractable. Consequently, we propose dual-specification query synthesis, which consumes both a NLQ and an optional PBE-like table sketch query that enables users to express varied levels of domain knowledge. We introduce the novel dual-specification Duoquest system, which leverages guided partial query enumeration to efficiently explore the space of possible queries. We present results from user studies in which Duoquest demonstrates a 62.5% absolute increase in query construction accuracy over a state-of-the-art NLI and comparable accuracy to a PBE system on a limited workload supported by the PBE system. In a simulation study on the Spider benchmark, Duoquest demonstrates a >2x increase in top-1 accuracy over both NLI and PBE.

[1]  Alexandra Meliou,et al.  Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity , 2019, Proc. VLDB Endow..

[2]  H. V. Jagadish,et al.  Constructing Expressive Relational Queries with Dual-Specification Synthesis , 2020, CIDR.

[3]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[4]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[5]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[6]  Yan Gao,et al.  Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation , 2019, ACL.

[7]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[8]  Sara Cohen,et al.  Reverse Engineering SPJ-Queries from Examples , 2017, PODS.

[9]  Meihui Zhang,et al.  REGAL+: Reverse Engineering SPJA Queries , 2018, Proc. VLDB Endow..

[10]  编程语言 Query by Example , 2010, Encyclopedia of Database Systems.

[11]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[12]  Fotis Psallidas,et al.  S4: Top-k Spreadsheet-Style Search for Query Discovery , 2015, SIGMOD Conference.

[13]  Stefan Brass,et al.  Semantic errors in SQL queries: A quite complete list , 2006, J. Syst. Softw..

[14]  Denis Mayr Lima Martins,et al.  Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities , 2019, Inf. Syst..

[15]  Tao Yu,et al.  SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task , 2018, EMNLP.

[16]  Jonathan Berant,et al.  Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing , 2019, ACL.

[17]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[18]  Srinivasan Parthasarathy,et al.  Query reverse engineering , 2014, The VLDB Journal.

[19]  Po-Sen Huang,et al.  Execution-Guided Neural Program Decoding , 2018, ArXiv.

[20]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[21]  H. V. Jagadish,et al.  Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22]  Surajit Chaudhuri,et al.  Discovering queries based on example tuples , 2014, SIGMOD Conference.

[23]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[24]  David Maier,et al.  Query From Examples: An Iterative, Data-Driven Approach to Query Construction , 2015, Proc. VLDB Endow..

[25]  Sebastian Michel,et al.  Reverse Engineering Top-k Database Queries with PALEO , 2016, EDBT.