Challenges in Managing Implicit and Abstract Provenance Data: Experiences with ProvManager

Running scientific workflows in distributed and heterogeneous environments has been motivating the definition of provenance gathering approaches that are loosely coupled to workflow management systems. We have developed a provenance management system named ProvManager to manage provenance data in distributed and heterogeneous environments independent of a specific Scientific Workflow Management System. The experience of using ProvManager in real workflow applications has shown many provenance management issues that are not addressed in current related work. We have faced challenges such as the necessity of dealing with implicit provenance data and the lack of higher provenance abstraction levels. This paper discusses and points to directions towards these challenges, contextualizing them according to our experience in developing ProvManager.

[1]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[2]  Cláudio T. Silva,et al.  Bridging Workflow and Data Provenance Using Strong Links , 2010, SSDBM.

[3]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[4]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[5]  Marta Mattoso,et al.  Integrating Provenance Data from Distributed Workflow Systems with ProvManager , 2010, IPAW.

[6]  Jing Hua,et al.  Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System , 2008, 2008 IEEE International Conference on Services Computing.

[7]  Jianwu Wang,et al.  Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems , 2009, WORKS '09.

[8]  Paulo Pinheiro,et al.  On the Use of Semantic Abstract Workflows Rooted on Provenance Concepts , 2010, IPAW.

[9]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[10]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[11]  Simon Miles,et al.  PrIMe: a software engineering methodology for developing provenance-aware applications , 2006, SEM '06.

[12]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[13]  Luc Moreau,et al.  The Open Provenance Model: An Overview , 2008, IPAW.

[14]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[15]  Marta Mattoso,et al.  Exploring many task computing in scientific workflows , 2009, MTAGS '09.

[16]  Marta Mattoso,et al.  Managing Provenance in Scientific Workflows with ProvManager , 2010 .