A novel update propagation module for the data provenance problem: a contemplating vision on realizing data provenance from models to storage

To date, the systems approach to science, which emphasizes the connections among phenomena studied at different scales and by different disciplines, is causing dramatic changes in how scientific results are communicated. These changes drive a shift on how to propagate data with certain properties so that it can be used intelligently by others. In this work we elaborate on three major factors governing the propagation module of data provenance. The proposed propagation module provides an efficient solution for many critical problems in the management and provenance of scientific data. Unlike previous work, our work aims at realizing data provenance from models to storage. The natural representation of data as objects and its utility for capturing provenance has led us to consider a new storage architecture based on the object-based storage (OSD) technology. An outline of this framework is discussed.

[1]  Hao Fan Tracing Data Lineage Using Automed Schema Transformation Pathways , 2002, BNCOD.

[2]  Jeremy Frumkin The death, and rebirth, of the metadata record - rethinking library search , 2006, OCLC Syst. Serv..

[3]  Garth A. Gibson,et al.  Integrity and Performance in Network Attached Storage , 1999, ISHPC.

[4]  Robert Stevens,et al.  Annotating, Linking and Browsing Provenance Logs for {e-Science} , 2003 .

[5]  Rajendra Bose A conceptual framework for composing and managing scientific data lineage , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[6]  Christophe Claramunt,et al.  A lineage metadata model for the temporal management of a cadastre application , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[7]  Norman Clark,et al.  The evolution of information: Lineages in gene culture and artefact , 1992 .

[8]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[9]  Marc Unangst,et al.  NASD Scalable Storage Systems , 1999 .

[10]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[11]  Alin Deutsch,et al.  A deterministic model for semistructured data , 1999 .

[12]  Luc Moreau,et al.  Provenance of e-Science Experiments - Experience from Bioinformatics , 2003 .

[13]  Christos Faloutsos,et al.  Recovering Information from Summary Data , 1997, VLDB.

[14]  Matthew O. Ward,et al.  The Gaea System: A Spatio-Temporal Database System for Global Change Studies , 2007 .

[15]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[16]  Rodney Van Meter,et al.  Network attached storage architecture , 2000, CACM.

[17]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[19]  Wenfei Fan,et al.  Query Optimization for Semistructured Data Using Path Constraints in a Deterministic Data Model , 1999, DBPL.

[20]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[21]  Garth A. Gibson,et al.  Integrity and Performance in Network Attached Storage (CMU-CS-98-182) , 1998 .

[22]  Gustavo Alonso,et al.  Towards a Platform for Distributed Application Development , 1998 .

[23]  Arunprasad P. Marathe Tracing Lineage of Array Data , 2004, Journal of Intelligent Information Systems.

[24]  Michael Stonebraker,et al.  BigSur: A System For the Management of Earth Science Data , 1995, VLDB.

[25]  Ilya M. Sobol,et al.  Sensitivity Estimates for Nonlinear Mathematical Models , 1993 .

[26]  B. Barkstrom Digital Archive Issues From the Perspective of an Earth Science Data Producer , 1998 .

[27]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[28]  Karen Schuchardt,et al.  Multi-scale Science: Supporting Emerging Practice with Semantically Derived Provenance , 2003 .

[29]  Alexandra Poulovassilis,et al.  Tracing Data Lineage Using Schema Transformation Pathways , 2003, Knowledge Transformation for the Semantic Web.