Automatic capture and reconstruction of computational provenance

The Earth System Science Server (ES3) project is developing a local infrastructure for managing Earth science data products derived from satellite remote sensing. By ‘local,’ we mean the infrastructure that a scientist uses to manage the creation and dissemination of her own data products, particularly those that are constantly incorporating corrections or improvements based on the scientist's own research. Therefore, in addition to being robust and capacious enough to support public access, ES3 is intended to be flexible enough to manage the idiosyncratic computing ensembles that typify scientific research. Instead of specifying provenance explicitly with a workflow model, ES3 extracts provenance information automatically from arbitrary applications by monitoring their interactions with their execution environment. These interactions (arguments, file I-O, system calls, etc.) are logged to the ES3 database, which assembles them into provenance graphs. These graphs resemble workflow specifications, but are really reports—they describe what actually happened, as opposed to what was requested. The ES3 database supports forward and backward navigation through provenance graphs (i.e. ancestor-descendant queries), as well as graph retrieval. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  Susan B. Davidson,et al.  Addressing the provenance challenge using ZOOM , 2008 .

[2]  Ulrik Brandes,et al.  GraphML Progress Report , 2001, GD.

[3]  Cláudio T. Silva,et al.  Tackling the Provenance Challenge one layer at a time , 2008 .

[4]  Dennis Gannon,et al.  Query capabilities of the Karma provenance framework , 2008 .

[5]  Roger Barga,et al.  Automatic capture and efficient storage of e-Science experiment provenance , 2008 .

[6]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[7]  Karen Schuchardt,et al.  Applying content management to automated provenance capture , 2008 .

[8]  S. Maritorena,et al.  Consistent merging of satellite ocean color data sets using a bio-optical model , 2005 .

[9]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[10]  Thomas H. Painter,et al.  MULTISPECTRAL AND HYPERSPECTRAL REMOTE SENSING OF ALPINE SNOW PROPERTIES , 2004 .

[11]  Yong Zhao,et al.  Tracking provenance in a virtual data grid , 2008 .

[12]  Håvar Valeur Tracking the Lineage of Arbitrary Processing Sequences , 2005 .

[13]  Luděk Matyska,et al.  gLite Job Provenance—a job-centric view , 2008 .

[14]  Bertram Ludäscher,et al.  From computation models to models of provenance: the RWS approach , 2008 .

[15]  Robert Stevens,et al.  Mining Taverna's semantic web of provenance , 2008 .

[16]  Margo Seltzer,et al.  PASSing the provenance challenge , 2008 .

[17]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[18]  James D. Myers,et al.  Tracking provenance semantics in heterogeneous execution systems , 2008 .