Understanding Collaborative Studies through Interoperable Workflow Provenance

The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. Currently, most provenance models are designed to capture the provenance related to a single run, and mostly executed by a single user. However, a scientific discovery is often the result of methodical execution of many scientific workflows with many datasets produced at different times by one or more users. Further, to promote and facilitate exchange of information between multiple workflow systems supporting provenance, the Open Provenance Model (OPM) has been proposed by the scientific workflow community. In this paper, we describe a new query model that captures implicit user collaborations. We show how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions. We also adopt and extend the high-level Query Language for Provenance (QLP) with additional constructs, and show how these extensions allow non-expert users to express collaborative provenance queries against this model easily and concisely. Furthermore, we adopt the Provenance Challenge 3 (PC3) workflows as a collaborative and interoperable usecase scenario, where different stages of the workflow are executed in three different workflow environments - Kepler, Taverna, and WSVLAM. Through this usecase, we demonstrate how we can establish and understand collaborative studies through interoperable workflow provenance.

[1]  Bertram Ludäscher,et al.  Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs , 2009, SSDBM.

[2]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[3]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[4]  Cees T. A. M. de Laat,et al.  WS-VLAM: towards a scalable workflow system on the grid , 2007, WORKS '07.

[5]  Marian Bubak,et al.  Invocation of operations from script-based Grid applications , 2010, Future Gener. Comput. Syst..

[6]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[7]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[8]  Bertram Ludäscher,et al.  A navigation model for exploring scientific workflow provenance graphs , 2009, WORKS '09.

[9]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[10]  Carole A. Goble,et al.  Taverna Workflows: Syntax and Semantics , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[11]  Bertram Ludäscher,et al.  Approaches for Exploring and Querying Scientific Workflow Provenance Graphs , 2010, IPAW.

[12]  Jing Chen,et al.  CAMERA 2.0: A Data-centric Metagenomics Community Infrastructure Driven by Scientific Workflows , 2010, 2010 6th World Congress on Services.

[13]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[14]  LudäscherBertram,et al.  Scientific workflow management and the Kepler system , 2006 .

[15]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[16]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[17]  Cees T. A. M. de Laat,et al.  VLE-WFBus: A Scientific Workflow Bus for Multi e-Science Domains , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[18]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[19]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[20]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[21]  Carole Goble,et al.  Lessons from myExperiment: Research Objects for Data Intensive Research , 2009 .

[22]  Bertram Ludäscher,et al.  Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs , 2007, DILS.

[23]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[24]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[25]  Bertram Ludäscher,et al.  Provenance browser: Displaying and querying scientific workflow provenance graphs , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[27]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[28]  Luc Moreau,et al.  The Open Provenance Model: An Overview , 2008, IPAW.

[29]  Carole A. Goble,et al.  Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[30]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..