Explaining Query Answers with Explanation-Ready Databases

With the increased generation and availability of big data in different domains, there is an imminent requirement for data analysis tools that are able to 'explain' the trends and anomalies obtained from this data to a range of users with different backgrounds. Wu-Madden (PVLDB 2013) and Roy-Suciu (SIGMOD 2014) recently proposed solutions that can explain interesting or unexpected answers to simple aggregate queries in terms of predicates on attributes. In this paper, we propose a generic framework that can support much richer, insightful explanations by preparing the database offline, so that top explanations can be found interactively at query time. The main idea in such explanation-ready databases is to pre-compute the effects of potential explanations (called interventions), and efficiently re-evaluate the original query taking into account these effects. We formalize this notion and define an explanation-query that can evaluate all possible explanations simultaneously without having to run an iterative process, develop algorithms and optimizations, and evaluate our approach with experiments on real data.

[1]  Nick Roussopoulos,et al.  An incremental access method for ViewCache: concept, algorithms, and cost analysis , 1991, TODS.

[2]  Cong Yu,et al.  MRI: Meaningful Interpretations of Collaborative Ratings , 2011, Proc. VLDB Endow..

[3]  Johannes Gehrke,et al.  Explainable security for relational databases , 2014, SIGMOD Conference.

[4]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[5]  Dan Suciu,et al.  PerfXplain: Debugging MapReduce Job Performance , 2012, Proc. VLDB Endow..

[6]  David P. Woodruff,et al.  Multi-Tuple Deletion Propagation: Approximations and Complexity , 2013, Proc. VLDB Endow..

[7]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[8]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[9]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[10]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[11]  Dan Suciu,et al.  Causality and Explanations in Databases , 2014, Proc. VLDB Endow..

[12]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[13]  Suman Nath,et al.  Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.

[14]  Christoph Koch,et al.  Incremental query evaluation in a ring of databases , 2010, PODS.

[15]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[16]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  Limin Jia,et al.  Maintaining distributed logic programs incrementally , 2011, Comput. Lang. Syst. Struct..

[19]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[20]  Yannis Papakonstantinou,et al.  Utilizing IDs to Accelerate Incremental View Maintenance , 2015, SIGMOD Conference.

[21]  Jennifer Widom,et al.  On-line warehouse view maintenance , 1997, SIGMOD '97.

[22]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[23]  A BlakeleyJose,et al.  Efficiently updating materialized views , 1986 .

[24]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[25]  Daniel Fabbri,et al.  Explanation-Based Auditing , 2011, Proc. VLDB Endow..

[26]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[27]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..