Using SQL for Efficient Generation and Querying of Provenance Information

In applications such as data warehousing or data exchange, the ability to efficiently generate and query provenance information is crucial to understand the origin of data. In this chapter, we review some of the main contributions of Perm, a DBMS that generates different types of provenance information for complex SQL queries (including nested and correlated subqueries and aggregation). The two key ideas behind Perm are representing data and its provenance together in a single relation and relying on query rewrites to generate this representation. Through this, Perm supports fully integrated, on-demand provenance generation and querying using SQL. Since Perm rewrites a query requesting provenance into a regular SQL query and generates easily optimizable SQL code, its performance greatly benefits from the query optimization techniques provided by the underlying DBMS.

[1]  Min Wang,et al.  On the Efficiency of Provenance Queries , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Gustavo Alonso,et al.  TRAMP: Understanding the Behavior of Schema Mappings through Provenance , 2010, Proc. VLDB Endow..

[3]  James Cheney,et al.  A Graph Model of Data and Workflow Provenance , 2010, TaPP.

[4]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[5]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[6]  Gustavo Alonso,et al.  Provenance for nested subqueries , 2009, EDBT '09.

[7]  Gustavo Alonso,et al.  Perm: Efficient Provenance Support for Relational Databases , 2010 .

[8]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[9]  Grigoris Karvounarakis,et al.  Semiring-annotated data: queries and provenance? , 2012, SGMD.

[10]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[11]  James Cheney,et al.  Program Slicing and Data Provenance , 2007, IEEE Data Eng. Bull..

[12]  Todd J. Green,et al.  Containment of Conjunctive Queries on Annotated Relations , 2009, ICDT '09.

[13]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[14]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[15]  Wang Chiew Tan Containment of Relational Queries with Annotation Propagation , 2003, DBPL.

[16]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.

[17]  Daniel Deutch,et al.  On the Limitations of Provenance for Queries with Difference , 2011, TaPP.

[18]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.

[19]  Jaehong Park,et al.  A provenance-based access control model , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[20]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[21]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[22]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[23]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[24]  James Cheney,et al.  Causality and the Semantics of Provenance , 2010, DCM.

[25]  Antonella Poggi,et al.  On database query languages for K-relations , 2010, J. Appl. Log..

[26]  Min Wang,et al.  Provenance query evaluation: what's so special about it? , 2009, CIKM.

[27]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[28]  Todd J. Green Containment of conjunctive queries on annotated relations , 2009, ICDT.

[29]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[30]  Wang Chiew Tan,et al.  DBNotes: a post-it system for relational databases based on provenance , 2005, SIGMOD '05.

[31]  Egor V. Kostylev,et al.  Combining dependent annotations for relational algebra , 2012, ICDT '12.

[32]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[33]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[34]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.