Fault-Tolerance in Dataflow-Based Scientific Workflow Management

This paper addresses the challenges of providing fault-tolerance in scientific workflow management. The specification and handling of faults in scientific workflows should be defined precisely in order to ensure the consistent execution against the process-specific requirements. We identified a number of typical failure patterns that occur in real-life scientific workflow executions. Following the intuitive recovery strategies that correspond to the identified patterns, we developed the methodologies that integrate recovery fragments into fault-prone scientific workflow models. Compared to the existing fault-tolerance mechanisms, the propositions reduce the effort of workflow designers by defining recovery fragments automatically. Furthermore, the developed framework implements the necessary mechanisms to capture the faults from the different layers of a scientific workflow management architecture. Experience indicates that the framework can be employed effectively to model, capture and tolerate the typical failure patterns that we identified.

[1]  Amit P. Sheth,et al.  Specification and Execution of Transactional Workflows , 1995, Modern Database Systems.

[2]  Jun Qin,et al.  ASKALON: a Grid application development and computing environment , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[3]  Wil M. P. van der Aalst,et al.  Workflow Exception Patterns , 2006, CAiSE.

[4]  François Charoy,et al.  Multiple Instantiation in a Dynamic Workflow Environment , 2004, CAiSE.

[5]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[6]  Claude Godart,et al.  Ensuring required failure atomicity of composite Web services , 2005, WWW '05.

[7]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[8]  Mladen A. Vouk,et al.  A Fault-Tolerance Architecture for Kepler-Based Distributed Scientific Workflows , 2010, SSDBM.

[9]  Paul W. P. J. Grefen,et al.  A Taxonomy of Transactional Workflow Support , 2006, Int. J. Cooperative Inf. Syst..

[10]  Janaka Balasooriya,et al.  Web Service Orchestration for Bioinformatics Systems: Challenges and Current Workflow Definition Approaches , 2007, IEEE International Conference on Web Services (ICWS 2007).

[11]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[12]  Maria E. Orlowska,et al.  Time Management in Dynamic Workflows , 1999, CODAS.

[13]  Edward A. Lee,et al.  Dataflow process networks , 2001 .

[14]  François Charoy,et al.  Spheres of Isolation: Adaptation of Isolation Levels to Transactional Workflow , 2005, Business Process Management.

[15]  Gustavo Alonso,et al.  Exception Handling in Workflow Management Systems , 2000, IEEE Trans. Software Eng..

[16]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[17]  Anne H. H. Ngu,et al.  Business versus Scientific Workflows: A Comparative Study , 2009, 2009 Congress on Services - I.

[18]  Scott Klasky,et al.  Tracking Files in the Kepler Provenance Framework , 2009, SSDBM.

[19]  Fabrizio Silvestri,et al.  Biological Experiments on the Grid: A Novel Workflow Management Platform , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[20]  Edward A. Lee,et al.  Ptolemy: A Framework for Simulating and Prototyping Heterogenous Systems , 2001, Int. J. Comput. Simul..

[21]  Anne H. H. Ngu,et al.  Towards scientific workflow patterns , 2009, WORKS '09.