Data Provenance: Some Basic Issues

The ease with which one can copy and transform data on the Web, has made it increasingly difficult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance is now an acute issue in scientific databases where it is central to the validation of data. In this paper we discuss some of the technical issues that have emerged in an initial exploration of the topic.

[1]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[2]  Susan B. Davidson,et al.  View Maintenance for Hierarchical Semistructured Data , 2000, DaWaK.

[3]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[4]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[5]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[6]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[7]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[8]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[9]  Gio Wiederhold,et al.  Updating relational databases through object-based views , 1991, SIGMOD '91.

[10]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  Michael E. Lesk,et al.  Practical Digital Libraries: Books, Bytes, and Bucks , 1997 .

[12]  Jon Doyle,et al.  A Truth Maintenance System , 1979, Artif. Intell..

[13]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[14]  Janet Daly Overview of the World Wide Web Consortium (W3C) (SIGs IA, USE). , 2000 .

[15]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[16]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .