Computer-Assisted Query Formulation

Database management systems DBMS typically provide an applicationprogramming interface for users to issue queries using querylanguages such as SQL. Many such languages were originally designedfor business data processing applications. While these applications arestill relevant, two other classes of applications have become importantusers of data management systems: a web applications that issuequeries programmatically to the DBMS, and b data analytics involvingcomplex queries that allow data scientists to better understand theirdatasets. Unfortunately, existing query languages provided by databasemanagement systems are often far from ideal for these application domains.In this tutorial, we describe a set of technologies that assist users inspecifying database queries for different application domains. The goalof such systems is to bridge the gap between current query interfacesprovided by database management systems and the needs of differentusage scenarios that are not well served by existing query languages.We discuss the different interaction modes that such systems provideand the algorithms used to infer user queries. In particular, we focuson a new class of systems built using program synthesis techniques,and furthermore discuss opportunities in combining synthesis and othermethods used in prior systems to infer user queries.

[1]  Olga Papaemmanouil,et al.  Explore-by-example: an automatic query steering framework for interactive data exploration , 2014, SIGMOD Conference.

[2]  Alvin Cheung,et al.  Bridging the Gap Between General-Purpose and Domain-Specific Compilers with Synthesis , 2015, SNAPL.

[3]  Xiaolei Qian,et al.  The deductive synthesis of database transactions , 1993, TODS.

[4]  H. V. Jagadish,et al.  NaLIX: A generic natural language search environment for XML data , 2007, TODS.

[5]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[6]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[7]  William R. Cook,et al.  Extracting queries by static analysis of transparent persistence , 2007, POPL '07.

[8]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..

[9]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[10]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[11]  Manu Sridharan,et al.  TAJ: effective taint analysis of web applications , 2009, PLDI '09.

[12]  Goetz Graefe,et al.  Hash Joins and Hash Teams in Microsoft SQL Server , 1998, VLDB.

[13]  Dan Suciu,et al.  A Case for A Collaborative Query Management System , 2009, CIDR.

[14]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[15]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[16]  Bongshin Lee,et al.  SketchStory: Telling More Engaging Stories with Data through Freeform Sketching , 2013, IEEE Transactions on Visualization and Computer Graphics.

[17]  Emina Torlak,et al.  Kodkod: A Relational Model Finder , 2007, TACAS.

[18]  Willy Zwaenepoel,et al.  JReq: Database Queries in Imperative Languages , 2010, CC.

[19]  P.J. Haas,et al.  Sampling-based selectivity estimation for joins using augmented frequent value statistics , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[20]  Neoklis Polyzotis,et al.  Query Recommendations for Interactive Database Exploration , 2009, SSDBM.

[21]  Stephan Merz,et al.  Exploiting Symmetry in SMT Problems , 2011, CADE.

[22]  Zohar Manna,et al.  Deductive Synthesis of the Unification Algorithm , 1981, Sci. Comput. Program..

[23]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2013, PLDI.

[24]  Edsger W. Dijkstra,et al.  Guarded commands, nondeterminacy and formal derivation of programs , 1975, Commun. ACM.

[25]  Edmund M. Clarke,et al.  Counterexample-guided abstraction refinement , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[26]  David Maier,et al.  Rapid bushy join-order optimization with Cartesian products , 1996, SIGMOD '96.

[27]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Lawrence A. Rowe,et al.  Data abstraction, views and updates in RIGEL , 1979, SIGMOD '79.

[29]  Alvin Cheung,et al.  Demo: Automatically generating interesting events with LifeJoin , 2011, SenSys.

[30]  Jonathan Traugott Deductive Synthesis of Sorting Programs , 1989, J. Symb. Comput..

[31]  R. van Renesse,et al.  An experiment in formal design using meta-properties , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[32]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[33]  Michael Stonebraker,et al.  Data Curation at Scale: The Data Tamer System , 2013, CIDR.

[34]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[35]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[36]  Björn Hartmann,et al.  Quicksilver: Automatic Synthesis of Relational Queries , 2013 .

[37]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[38]  Sumit Gulwani,et al.  NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation , 2014, SIGMOD Conference.

[39]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[40]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[41]  C. J. Date An introduction to database systems (7. ed.) , 1999 .

[42]  Sumit Gulwani,et al.  Spreadsheet data manipulation using examples , 2012, CACM.

[43]  Arnab Nandi,et al.  GestureQuery: A Multitouch Database Query Interface , 2013, Proc. VLDB Endow..

[44]  Moshé M. Zloof Query-by-example: the invocation and definition of tables and forms , 1975, VLDB '75.

[45]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[46]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[47]  Zohar Manna,et al.  Fundamentals of Deductive Program Synthesis , 1992, IEEE Trans. Software Eng..

[48]  Andreas Blass,et al.  Inadequacy of computable loop invariants , 2001, TOCL.

[49]  Michael R. Clarkson,et al.  Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[50]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[51]  Michael Frazier,et al.  Learning conjunctions of Horn clauses , 2004, Machine Learning.

[52]  Orna Kupferman,et al.  Synthesizing Distributed Systems , 2001, LICS.

[53]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[54]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[55]  Alexander Aiken,et al.  Scalable error detection using boolean satisfiability , 2005, POPL '05.

[56]  Alvin Cheung,et al.  Sloth: being lazy is a virtue (when issuing database queries) , 2014, SIGMOD Conference.

[57]  Martin C. Rinard,et al.  Compositional pointer and escape analysis for Java programs , 1999, OOPSLA '99.

[58]  Raffaele Garofalo The Entity Framework , 2011 .

[59]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[60]  Sai Zhang,et al.  Automatically synthesizing SQL queries from input-output examples , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[61]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[62]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[63]  Martin L. Kersten,et al.  Fast, Randomized Join-Order Selection - Why Use Transformations? , 1994, VLDB.

[64]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[65]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[66]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[67]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[68]  Alvin Cheung,et al.  Optimizing database-backed applications with query synthesis , 2013, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[69]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[70]  Lauretta O. Osho,et al.  Axiomatic Basis for Computer Programming , 2013 .

[71]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[72]  Drew McDermott,et al.  Derivation of Glue Code for Agent Interoperation , 2000, AGENTS '00.

[73]  Philip Wadler,et al.  Links: Web Programming Without Tiers , 2006, FMCO.

[74]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[75]  David Gries,et al.  The Science of Programming , 1981, Text and Monographs in Computer Science.

[76]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[77]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[78]  Alvin Cheung,et al.  Using program synthesis for social recommendations , 2012, CIKM.

[79]  Rastislav Bodík,et al.  Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana , 2015 .

[80]  Robert W. Floyd,et al.  Assigning Meanings to Programs , 1993 .

[81]  Evaggelia Pitoura Query Optimization , 2009, Encyclopedia of Database Systems.

[82]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[83]  Rastislav Bodík,et al.  Algorithmic program synthesis: introduction , 2013, International Journal on Software Tools for Technology Transfer.

[84]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[85]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[86]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[87]  César A. Galindo-Legaria,et al.  Counting, enumerating, and sampling of execution plans in a cost-based query optimizer , 2000, SIGMOD '00.

[88]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[89]  Sumit Gulwani,et al.  Dimensions in program synthesis , 2010, Formal Methods in Computer Aided Design.

[90]  Joachim W. Schmidt,et al.  Some high level language constructs for data of type relation , 1977, TODS.

[91]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[92]  Alvin Cheung,et al.  Verified lifting of stencil computations , 2016, PLDI.

[93]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[94]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[95]  Rajeev Alur,et al.  TRANSIT: specifying protocols with concolic snippets , 2013, PLDI.

[96]  David Maier,et al.  Making smalltalk a database system , 1984, SIGMOD '84.

[97]  William R. Cook,et al.  Interprocedural query extraction for transparent persistence , 2008, OOPSLA.

[98]  Sumit Gulwani,et al.  Program verification using templates over predicate abstraction , 2009, PLDI '09.