Managing Provenance in Scientific Workflows with ProvManager

Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow activity should collect its own provenance data and publish them in a repository which scientists can access to make their queries. In this paper we show how provenance is captured along distributed heterogeneous systems. Two main strategies are used to capture provenance: using Prolog predicates to register provenance, and using an API for the communication between the wrapped activity and the ProvManager.

[1]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[2]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[3]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[4]  Simon Miles,et al.  PrIMe: a software engineering methodology for developing provenance-aware applications , 2006, SEM '06.

[5]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[6]  Jing Hua,et al.  Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System , 2008, 2008 IEEE International Conference on Services Computing.

[7]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[8]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[9]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[10]  Marta Mattoso,et al.  A Lightweight Middleware Monitor for Distributed Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[11]  Marta Mattoso,et al.  A Strategy for Provenance Gathering in Distributed Scientific Workflows , 2009, 2009 Congress on Services - I.

[12]  Gustavo Alonso,et al.  Dynamic weaving for aspect-oriented programming , 2002, AOSD '02.