Pipeline-centric provenance model

In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.

[1]  Carl Kesselman,et al.  GriPhyN and LIGO, building a virtual data Grid for gravitational wave scientists , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[2]  Paul T. Groth,et al.  A model of process documentation to determine provenance in mash-ups , 2009, TOIT.

[3]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[4]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[5]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[6]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[7]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[8]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[9]  Brian Neil Levine,et al.  DEX: Digital evidence provenance supporting reproducibility and comparison , 2009 .

[10]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[11]  Ian Foster,et al.  Representing Virtual Data: A Catalog Architecture for Location and Materialization Trans-parency , 2001 .

[12]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[13]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[14]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[15]  David Charles De Roure,et al.  myExperiment: social networking for workflow-using e-scientists , 2007, WORKS '07.

[16]  James Liebert,et al.  The Two Micron All Sky Survey (2MASS): Overview and Status , 1997 .

[17]  Daniel S. Katz,et al.  Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand , 2004, SPIE Astronomical Telescopes + Instrumentation.

[18]  Carole A. Goble,et al.  Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[19]  Jack Dongarra,et al.  Digital Software and Data Repositories for Support of Scientific Computing , 1995, Advances in Digital Libraries.

[20]  Yong Zhao,et al.  Tracking provenance in a virtual data grid , 2008, Concurr. Comput. Pract. Exp..

[21]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[22]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[23]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[24]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.