A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries

Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions for queries with negation (firstorder queries). Our approach is based on a rewriting of Datalog rules (called firing rules) that captures successful rule derivations within the context of a Datalog query. We extend this rewriting to support negation and to capture failed derivations that explain missing answers. Given a (why or why-not) provenance question, we compute an explanation, i.e., the part of the provenance that is relevant to answer the question. We introduce optimizations that prune parts of a provenance graph early on if we can determine that they will not be part of the explanation for a given question. We present an implementation that runs on top of a relational database using SQL to compute explanations. Our experiments demonstrate that our approach scales to large instances and significantly outperforms an earlier approach which instantiates the full provenance to compute explanations.

[1]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[2]  Bertram Ludäscher,et al.  Towards Constraint Provenance Games , 2014, TAPP.

[3]  Bertram Ludäscher,et al.  Towards Constraint-based Explanations for Answers and Non-Answers , 2015, TaPP.

[4]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[5]  Gustavo Alonso,et al.  Using SQL for Efficient Generation and Querying of Provenance Information , 2013, In Search of Elegance in the Theory and Practice of Computation.

[6]  Bertram Ludäscher,et al.  Implementing Unified Why- and Why-Not Provenance Through Games , 2016, IPAW.

[7]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[8]  Parag Agrawal,et al.  Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..

[9]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[10]  Todd J. Green,et al.  LogicBlox, Platform and Language: A Tutorial , 2012, Datalog.

[11]  Evgeny Sherkhonov,et al.  High-Level Why-Not Explanations using Ontologies , 2014, PODS.

[12]  Andreas Haeberlen,et al.  Diagnosing missing events in distributed systems with negative provenance , 2015, SIGCOMM 2015.

[13]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[14]  Bertram Ludäscher,et al.  Efficiently Computing Provenance Graphs for Queries with Negation , 2017, ArXiv.

[15]  Daniel Deutch,et al.  Selective Provenance for Datalog Programs Using Top-K Queries , 2015, Proc. VLDB Endow..

[16]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[17]  Melanie Herschel,et al.  Immutably answering Why-Not questions for equivalent conjunctive queries , 2015, Ingénierie des Systèmes d Inf..

[18]  Daniel Deutch,et al.  Circuits for Datalog Provenance , 2014, ICDT.

[19]  Bertram Ludäscher,et al.  Declarative Datalog Debugging for Mere Mortals , 2012, Datalog.

[20]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[21]  Enrico Pontelli,et al.  Justifications for Logic Programs Under Answer Set Semantics , 2006, ICLP.

[22]  Melanie Herschel,et al.  Query-Based Why-Not Provenance with NedExplain , 2014, EDBT.

[23]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[24]  Jörg Flum,et al.  Total and Partial Well-Founded Datalog Coincide , 1997, ICDT.

[25]  Dieter Gawlick,et al.  A Generic Provenance Middleware for Database Queries, Updates, and Transactions , 2014 .

[26]  Grigoris Karvounarakis,et al.  Semiring-annotated data: queries and provenance? , 2012, SGMD.

[27]  Bertram Ludäscher,et al.  First-Order Provenance Games , 2013, In Search of Elegance in the Theory and Practice of Computation.