DBNotes: a post-it system for relational databases based on provenance

We demonstrate DBNotes, a Post-It note system for relational databases where every piece of data may be associated with zero or more notes (or annotations). These annotations are transparently propagated along as data is being transformed. The method by which annotations are propagated is based on provenance (aka lineage): the annotations associated with a piece of data d in the result of a transformation consist of the annotations associated with each piece of data in the source where d is copied from. One immediate application of this system is to use annotations to systematically trace the provenance and flow of data. If every piece of source data is attached with an annotation that describes its address (i.e., origins), then the annotations of a piece of data in the result of a transformation describe its provenance. Hence, one can easily determine the provenance of data through a sequence of transformation steps simply by examining the annotations. Annotations can also be used to store additional information about data. Since a database schema is often proprietary, the ability to insert new information about data without having to change the underlying schema is a useful feature. For example, an error report could be attached to an erroneous piece of data, and this error report will be propagated to other databases along transformations, thus notifying other users of the error. Overall, the annotations on the result of a transformation can also provide an estimate on the quality of the resulting database.

[1]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[2]  Jennifer Widom,et al.  Lineage tracing in a data warehousing system , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[4]  Andrea Schulte-Peevers,et al.  Los Angeles & Southern California , 2005 .

[5]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[6]  John Mylopoulos,et al.  Representing and querying data transformations , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Amarnath Gupta,et al.  Spatiotemporal annotation graph (STAG): a data model for composite digital objects , 2005, 21st International Conference on Data Engineering (ICDE'05).