A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

Scientific workflows have gained great momentum in recent years due to their critical roles in e-Science and cyberinfrastructure applications. However, some tasks of a scientific workflow might fail during execution. A domain scientist might require a region of a scientific workflow to be "atomic". Data provenance, which determines the source data that are used to produce a data item, is also essential to scientific workflows. In this paper, we propose: (i) an architecture for scientific workflow management systems that supports both provenance and atomicity; (ii) a dataflow-oriented atomicity model that supports the notions of commit and abort; and (iii) a dataflow-oriented provenance model that, in addition to supporting existing provenance graphs and queries, also supports queries related to atomicity and failure.

[1]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[2]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[3]  Frank Leymann,et al.  Production Workflow: Concepts and Techniques , 1999 .

[4]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[5]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[6]  Paul W. P. J. Grefen,et al.  Customized atomicity specification for transactional workflows , 2000, Proceedings of the Third International Symposium on Cooperative Database Systems for Advanced Applications. CODAS 2001.

[7]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[8]  Paul T. Groth,et al.  Recording and using provenance in a protein compressibility experiment , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[9]  Umeshwar Dayal,et al.  Proceedings of the 1987 ACM SIGMOD international conference on Management of data , 1987 .

[10]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[11]  Philip A. Bernstein,et al.  Implementing recoverable requests using queues , 1990, SIGMOD '90.

[12]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[13]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[14]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[15]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..