Prospective and Retrospective Provenance Collection in Scientific Workflow Environments

Provenance, a record of the derivation history of scientific results, is critical for scientific workflows to support reproducibility, result interpretation, and problem diagnosis. Both prospective provenance, which captures an abstract workflow specification as a recipe for future data derivation, and retrospective provenance, which captures past workflow execution and data derivation information, provide important contextual information for the comprehensive analysis of scientific results. In this paper, we explore and design: i) a provenance model that models both prospective and retrospective provenance as an extension to the Open Provenance Model (OPM), which only models retrospective provenance; ii) a provenance collection framework to collect both prospective and retrospective provenance according to our model; iii) a relational provenance store to store, reason, and query prospective and retrospective provenance, which is captured via the proposed provenance collection framework. An experimental study is performed to show the performance of our provenance store using provenance queries for the Third Provenance Challenge. While most existing systems use an internal proprietary provenance model and develop an import/export facility to convert between the proprietary model and OPM, our provenance collection framework and provenance store feature the native support of OPM.

[1]  Paul T. Groth,et al.  Recording Process Documentation for Provenance , 2009, IEEE Transactions on Parallel and Distributed Systems.

[2]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[3]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[4]  Juliana Freire,et al.  Tackling the Provenance Challenge one layer at a time , 2008, Concurr. Comput. Pract. Exp..

[5]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[6]  Luc Moreau,et al.  Recording and Reasoning over Data Provenance in Web and Grid Services , 2003, OTM.

[7]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[8]  Shiyong Lu,et al.  A scientific workflow system for genomic data analysis , 2010 .

[9]  Yolanda Gil,et al.  Provenance trails in the Wings/Pegasus system , 2008, Concurr. Comput. Pract. Exp..

[10]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[11]  Jing Hua,et al.  A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution , 2009, IEEE Transactions on Services Computing.

[12]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[13]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[14]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[15]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[16]  Yogesh L. Simmhan,et al.  Provenance Information Model of Karma Version 3 , 2009, 2009 Congress on Services - I.

[17]  Paul T. Groth,et al.  Recording and using provenance in a protein compressibility experiment , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[18]  Bertram Ludäscher,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[19]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[20]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[21]  Yogesh L. Simmhan,et al.  The Open Provenance Model (v1.01) , 2008 .

[22]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.