Explaining Natural Language query results

Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations . We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case and one that is based on summarization. We have implemented our solution in an end-to-end system supporting questions, answers and provenance, all expressed in NL. Our experiments, including a user study, indicate the quality of our solution and its scalability.

[1]  Aditya G. Parameswaran,et al.  Smart Drill-Down: A New Data Exploration Operator , 2015, Proc. VLDB Endow..

[2]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[4]  Hoifung Poon,et al.  Grounded Unsupervised Semantic Parsing , 2013, ACL.

[5]  Daniel Deutch,et al.  Provenance for Natural Language Queries , 2017, Proc. VLDB Endow..

[6]  Daniel Deutch,et al.  NLProveNAns: Natural Language Provenance for Non-Answers , 2018, Proc. VLDB Endow..

[7]  Georgia Koutrika,et al.  Comprehensible Answers to Précis Queries , 2006, CAiSE.

[8]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[9]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[10]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[11]  Dan Suciu,et al.  Tiresias: a demonstration of how-to queries , 2012, SIGMOD Conference.

[12]  Nikos Mamoulis,et al.  Diverse and proportional size-l object summaries using pairwise relevance , 2016, The VLDB Journal.

[13]  Gustavo Alonso,et al.  Using SQL for Efficient Generation and Querying of Provenance Information , 2013, In Search of Elegance in the Theory and Practice of Computation.

[14]  Chris Brew,et al.  Natural Language Question Answering and Analytics for Diverse and Interlinked Datasets , 2015, HLT-NAACL.

[15]  Norman W. Paton,et al.  Fine-grained and efficient lineage querying of collection-based workflow provenance , 2010, EDBT '10.

[16]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[17]  Anastasia Ailamaki,et al.  Scientific workflow management by database management , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[18]  Susan B. Davidson,et al.  Addressing the provenance challenge using ZOOM , 2008 .

[19]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[20]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[21]  Andreas Haeberlen,et al.  Querying Provenance for Ranking and Recommending , 2012, TaPP.

[22]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[23]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Richard Hull,et al.  Business Artifacts: A Data-centric Approach to Modeling Business Operations and Processes , 2009, IEEE Data Eng. Bull..

[26]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[27]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[28]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[29]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[30]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[31]  Alessandro Moschitti,et al.  Translating Questions to SQL Queries with Generative Parsers Discriminatively Reranked , 2012, COLING.

[32]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[33]  Jan Van den Bussche,et al.  Mapping the NRC Dataflow Model to the Open Provenance Model , 2008, IPAW.

[34]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[35]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[36]  Susan B. Davidson,et al.  Addressing the provenance challenge using ZOOM , 2008, Concurr. Comput. Pract. Exp..

[37]  Martin L. Kersten,et al.  Have a chat with clustine, conversational engine to query large tables , 2016, HILDA '16.

[38]  Daniel Deutch,et al.  Selective Provenance for Datalog Programs Using Top-K Queries , 2015, Proc. VLDB Endow..

[39]  Tova Milo,et al.  A Natural Language Interface for Querying General and Individual Knowledge , 2015, Proc. VLDB Endow..

[40]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[41]  Bertram Ludäscher,et al.  A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[42]  Daniel Deutch,et al.  Putting Lipstick on Pig: Enabling Database-style Workflow Provenance , 2011, Proc. VLDB Endow..

[43]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[44]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[45]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[46]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[47]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[48]  Boris Glavic Big Data Provenance: Challenges and Implications for Benchmarking , 2012, WBDB.

[49]  Georgia Koutrika,et al.  Explaining structured queries in natural language , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[50]  Georgios John Fakas Automated generation of object summaries from relational databases: A novel keyword searching paradigm , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[51]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[52]  Torsten Grust,et al.  Provenance for SQL through Abstract Interpretation: Value-less, but Worthwhile , 2015, Proc. VLDB Endow..

[53]  Georgia Koutrika,et al.  Synthesizing structured text from logical database subsets , 2008, EDBT '08.

[54]  Michael Clausen,et al.  Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.

[55]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[56]  Edith Hemaspaandra,et al.  Minimization for Generalized Boolean Formulas , 2011, IJCAI.

[57]  Katrin Erk,et al.  Semantic Parsing using Distributional Semantics and Probabilistic Logic , 2014, ACL 2014.

[58]  Daniel Deutch,et al.  NLProv: Natural Language Provenance , 2016, Proc. VLDB Endow..

[59]  Nikos Mamoulis,et al.  Versatile Size-$l$ Object Summaries for Relational Keyword Search , 2014, IEEE Transactions on Knowledge and Data Engineering.

[60]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[61]  Georgia Koutrika,et al.  Précis: from unstructured keywords as queries to structured databases as answers , 2007, The VLDB Journal.

[62]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[63]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[64]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[65]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[66]  Khaled M. Elbassioni,et al.  On the readability of monotone Boolean formulae , 2011, J. Comb. Optim..

[67]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[68]  Jakub Závodný,et al.  FDB: A Query Engine for Factorised Relational Databases , 2012, Proc. VLDB Endow..

[69]  Melanie Herschel,et al.  Provenance: On and Behind the Screens , 2016, SIGMOD Conference.

[70]  Ofer Strichman,et al.  A New Class of Lineage Expressions over Probabilistic Databases Computable in P-Time , 2013, SUM.

[71]  Noureddine Mouaddib,et al.  Database Summarization: The SaintEtiQ System , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[72]  Wang Chiew Tan,et al.  DBNotes: a post-it system for relational databases based on provenance , 2005, SIGMOD '05.

[73]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[74]  Todd J. Green,et al.  Containment of Conjunctive Queries on Annotated Relations , 2009, ICDT '09.

[75]  Jakub Závodný,et al.  Factorised representations of query results: size bounds and readability , 2012, ICDT '12.

[76]  Jun Yang,et al.  Interactive Summarization and Exploration of Top Aggregate Query Answers , 2018, Proc. VLDB Endow..

[77]  Claire Gardent,et al.  Quelo Natural Language Interface: Generating queries and answer descriptions , 2014 .

[78]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[79]  Daniel Deutch,et al.  Approximated Summarization of Data Provenance , 2015, CIKM.

[80]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.