ProvManager: a provenance management system for scientific workflows

Running scientific workflows in distributed and heterogeneous environments has been a motivating approach for provenance management, which is loosely coupled to the workflow execution engine. This kind of approach is interesting because it allows both storage and access to provenance data in a homogeneous way, even in an environment where different workflow management systems work together. However, current approaches overload scientists with many ad hoc tasks, such as script adaptations and implementations of extra functionalities to provide provenance independence. This paper proposes ProvManager, a provenance management approach that eases the gathering, storage, and analysis of provenance information in a distributed and heterogeneous environment scenario, without putting the burden of adaptations on the scientist. ProvManager leverages the provenance management at the experiment level by integrating different workflow executions from multiple workflow management systems. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[2]  Marta Mattoso,et al.  A Strategy for Provenance Gathering in Distributed Scientific Workflows , 2009, 2009 Congress on Services - I.

[3]  Marta Mattoso,et al.  Provenance Services for Distributed Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[4]  Carole A. Goble,et al.  Seven Bottlenecks to Workflow Reuse and Repurposing , 2005, International Semantic Web Conference.

[5]  Marta Mattoso,et al.  Capturing Distributed Provenance Metadata from Cloud-Based Scientific Workflows , 2011, J. Inf. Data Manag..

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[8]  Jianwu Wang,et al.  Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems , 2009, WORKS '09.

[9]  Marta Mattoso,et al.  Provenance Query Patterns for Many-Task Scientific Computing , 2011, TaPP.

[10]  Marta Mattoso,et al.  Towards a Taxonomy of Provenance in Scientific Workflow Management Systems , 2009, 2009 Congress on Services - I.

[11]  Simon Miles,et al.  PrIMe: a software engineering methodology for developing provenance-aware applications , 2006, SEM '06.

[12]  Avelino J. Gonzalez,et al.  The Engineering of Knowledge-Based Systems , 1993 .

[13]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[14]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[15]  David Martin,et al.  Book review: The Engineering of Knowledge-based Systems Theory and Practice by Avelino J. Gonzales and Douglas D. Dankel (Prentice Hall, 1993) , 1993, SGAR.

[16]  Marta Mattoso,et al.  Using Explicit Control Processes in Distributed Workflows to Gather Provenance , 2008, IPAW.

[17]  Jing Hua,et al.  Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System , 2008, 2008 IEEE International Conference on Services Computing.

[18]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[19]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[20]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[21]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[22]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[23]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[24]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[25]  Luc Moreau,et al.  The Open Provenance Model: An Overview , 2008, IPAW.

[26]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .