A Hybrid Approach to Answering Why-Not Questions on Relational Query Results

In analyzing and debugging data transformations, or more specifically relational queries, a subproblem is to understand why some data are not part of the query result. This problem has recently been addressed from different perspectives for various fragments of relational queries. The different perspectives yield different yet complementary explanations of such missing answers. This article first aims at unifying the different approaches by defining a new type of explanation, called hybrid explanation, that encompasses the variety of previously defined types of explanations. This solution goes beyond simply forming the union of explanations produced by different algorithms and is shown to be able to explain a larger set of missing answers. Second, we present Conseil, an algorithm to generate hybrid explanations. Conseil is also the first algorithm to handle nonmonotonic queries. Experiments on efficiency and explanation quality show that Conseil is comparable and even outperforms previous algorithms. This article extends a previous short conference paper by providing proofs, additional theorems, and a detailed discussion of each step of the Conseil algorithm. It also significantly extends the experimental evaluation on efficiency and explanation quality.

[1]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[2]  Felix Naumann,et al.  Subsumption and complementation as data fusion operators , 2010, EDBT '10.

[3]  Wolfgang Gatterbauer,et al.  QueryViz: helping users understand SQL queries and their patterns , 2011, EDBT/ICDT '11.

[4]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[5]  Sanjeev Khanna,et al.  Edinburgh Research Explorer On the Propagation of Deletions and Annotations through Views , 2013 .

[6]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[7]  Rajasekar Krishnamurthy,et al.  HIL: a high-level scripting language for entity integration , 2013, EDBT '13.

[8]  Chengfei Liu,et al.  On answering why-not questions in reverse skyline queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[9]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[10]  Eric Lo,et al.  Answering Why-Not Questions on Top-K Queries , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[12]  Nicolas Spyratos,et al.  Update semantics of relational views , 1981, TODS.

[13]  Melanie Herschel,et al.  Query-Based Why-Not Provenance with NedExplain , 2014, EDBT.

[14]  Melanie Herschel,et al.  The nautilus analyzer: understanding and debugging data transformations , 2012, CIKM '12.

[15]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.

[16]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[17]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[18]  Melanie Herschel Wondering why data are missing from query results?: ask conseil why-not , 2013, CIKM.

[19]  Torsten Grust,et al.  Observing SQL queries in their natural habitat , 2013, TODS.

[20]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[21]  Melanie Herschel,et al.  Immutably answering Why-Not questions for equivalent conjunctive queries , 2015, Ingénierie des Systèmes d Inf..

[22]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[23]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..