Provenance in a Modifiable Data Set

Provenance of data is now widely recognized as being of great importance, thanks in large part to pioneering work [4, 6] by Peter Buneman and his collaborators in a stream that continues to produce influential papers today [1-3, 7]. When we consume data from a database, we often care about where these data come from, how they were derived, and so forth. We may desire answers to such questions to establish trust in the data, to investigate suspicious values, to debug code in the system, or for a host of other reasons. Considerable recent work has addressed many issues related to provenance. However, the standard assumption is that data sources, from which result data have been derived, are static. In reality, we know that most data are modified over time, including data sources used for deriving results of interest. When we consider provenance in the context of such modifications, many new problems arise. This chapter addresses two key problems in this context:

[1]  Peter Buneman,et al.  XArch: archiving scientific and reference data , 2008, SIGMOD Conference.

[2]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[4]  James Cheney,et al.  DBWiki: a structured wiki for curated data and collaborative data management , 2011, SIGMOD '11.

[5]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[6]  Keishi Tajima,et al.  Archiving scientific data , 2002, SIGMOD '02.

[7]  James Cheney,et al.  Curated databases , 2008, PODS.

[8]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[9]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[10]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[11]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[12]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[13]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[14]  Dan Suciu,et al.  WHY SO? or WHY NO? Functional Causality for Explaining Query Answers , 2009, MUD.