A Generic Provenance Middleware for Database Queries, Updates, and Transactions

We present an architecture and prototype implementation for a generic provenance database middleware (GProM) that is based on the concept of query rewrites, which are applied to an algebraic graph representation of database operations. The system supports a wide range of provenance types and representations for queries, updates, transactions, and operations spanning multiple transactions. GProM supports several strategies for provenance generation, e.g., on-demand, rule-based, and “always on”. To the best of our knowledge, we are the first to present a solution for computing the provenance of concurrent database transactions. Our solution can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS). Other noteworthy features of GProM include: extensibility through a declarative rewrite rule specification language, support for multiple database backends, and an optimizer for rewritten queries.

[1]  James Cheney,et al.  On the expressiveness of implicit provenance in query and update languages , 2008, TODS.

[2]  Juliana Freire,et al.  Towards Integrating Workflow and Database Provenance , 2012, IPAW.

[3]  James Cheney,et al.  Recording Provenance for SQL Queries and Updates , 2007, IEEE Data Eng. Bull..

[4]  Lois M. L. Delcambre,et al.  User Trust and Judgments in a Curated Database with Explicit Provenance , 2013, In Search of Elegance in the Theory and Practice of Computation.

[5]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[6]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[7]  Peter M. Fischer,et al.  Ariadne: Managing Fine-Grained Provenance on Data Streams , 2012 .

[8]  Torsten Grust,et al.  Let SQL drive the XQuery workhorse (XQuery join graph isolation) , 2010, EDBT '10.

[9]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[10]  Gustavo Alonso,et al.  Using SQL for Efficient Generation and Querying of Provenance Information , 2013, In Search of Elegance in the Theory and Practice of Computation.

[11]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[12]  Venkatesh Radhakrishnan,et al.  Fine Grain Provenance Using Temporal Databases , 2011, TaPP.

[13]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[14]  Val Tannen,et al.  Collaborative data sharing via update exchange and provenance , 2013, TODS.

[15]  Gustavo Alonso,et al.  Provenance for nested subqueries , 2009, EDBT '09.

[16]  Grigoris Karvounarakis,et al.  Semiring-annotated data: queries and provenance? , 2012, SGMD.

[17]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[18]  Jing Zhang,et al.  Lost source provenance , 2010, EDBT '10.