A New Perspective on Semantics of Data Provenance

Data Provenance refers to the "origin", "lineage", and "source" of data. In this work, we examine provenance from a semantics perspective and present the W7 model, an ontological model of data provenance. In the W7 model, provenance is conceptualized as a combination of seven interconnected elements including "what", "when", "where", "how", "who", "which" and "why". Each of these components may be used to track events that affect data during its lifetime. The W7 model is general and extensible enough to capture provenance semantics for data in different domains. Using the example of the Wikipedia, we illustrate how the W7 model can capture domain or application specific provenance.

[1]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[2]  Rajendra Bose A conceptual framework for composing and managing scientific data lineage , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[3]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[4]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on data quality , 2009, International Conference on Wireless Information Technology and Systems.

[5]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[6]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[7]  Michael Gruninger,et al.  Methodology for the Design and Evaluation of Ontologies , 1995, IJCAI 1995.

[8]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[9]  Luc Moreau,et al.  Provenance of e-Science Experiments - Experience from Bioinformatics , 2003 .

[10]  Sudha Ram,et al.  A Semiotics Framework for Analyzing Data Provenance Research , 2008, J. Comput. Sci. Eng..

[11]  Sudha Ram,et al.  Understanding the Semantics of Data Provenance to Support Active Conceptual Modeling , 2006, Active Conceptual Modeling of Learning.

[12]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.