On the Limitations of Provenance for Queries with Difference

The annotation of the results of database transformations was shown to be very effective for various applications. Until recently, most works in this context focused on positive query languages. The provenance semirings is a particular approach that was proven effective for these languages, and it was shown that when propagating provenance with semirings, the expected equivalence axioms of the corresponding query languages are satisfied. There have been several attempts to extend the framework to account for relational algebra queries with difference. We show here that these suggestions fail to satisfy some expected equivalence axioms (that in particular hold for queries on “standard” set and bag databases). Interestingly, we show that this is not a pitfall of these particular attempts, but rather every such attempt is bound to fail in satisfying these axioms, for some semirings. Finally, we show particular semirings for which an extension for supporting difference is (im)possible.

[1]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.

[2]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[3]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[4]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[5]  Bruno Bosbach Komplementäre Halbgruppen Ein Beitrag zur instruktiven Idealtheorie kommutativer Halbgruppen , 1965 .

[6]  Antonella Poggi,et al.  On database query languages for K-relations , 2010, J. Appl. Log..

[7]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[8]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[9]  Vladimiro Sassone,et al.  A Formal Model of Provenance in Distributed Systems , 2009, Workshop on the Theory and Practice of Provenance.

[10]  Foto N. Afrati,et al.  Managing Lineage and Uncertainty under a Data Exchange Setting , 2010, SUM.

[11]  Val Tannen,et al.  ORCHESTRA: facilitating collaborative data sharing , 2007, SIGMOD '07.

[12]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.

[13]  James Cheney,et al.  Recording Provenance for SQL Queries and Updates , 2007, IEEE Data Eng. Bull..

[14]  James Cheney,et al.  On the expressiveness of implicit provenance in query and update languages , 2008, TODS.

[15]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[16]  K. Amer Equationally complete classes of commutative monoids with monus , 1984 .

[17]  Frederick Reiss,et al.  Refining Information Extraction Rules using Data Provenance , 2010, IEEE Data Eng. Bull..

[18]  Jan Van den Bussche,et al.  Mapping the NRC Dataflow Model to the Open Provenance Model , 2008, IPAW.

[19]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[20]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[21]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[22]  Todd J. Green,et al.  Containment of Conjunctive Queries on Annotated Relations , 2009, ICDT '09.

[23]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[24]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..