Fault-tolerant behavior in state-of-the-art grid workflow management systems

While the worko w paradigm, emerged from the eld of business processes, has been proven to be the most successful paradigm for creating scientic applications for execution also on Grid infrastructures, most of the current Grid worko w management systems still cannot deliver the quality, robustness and reliability that are needed for widespread acceptance as tools used on a day-to-day basis for scientists from a multitude of scientic elds. This paper introduces the current state of the art in fault tolerance techniques for Grid worko w systems. The examined categories and the summary of current solutions reveal future directions in this area and help to guide research towards open issues.

[1]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Johan Tordsson,et al.  A Light-Weight Grid Workflow Execution Engine Enabling Client and Middleware Independence , 2007, PPAM.

[3]  Denis Caromel,et al.  A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability , 2005, Euro-Par.

[4]  John Shalf,et al.  Enabling Applications on the Grid: A Gridlab Overview , 2003, Int. J. High Perform. Comput. Appl..

[5]  Ian T. Foster,et al.  The Globus project: a status report , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[6]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[7]  Péter Kacsuk,et al.  Multi-Grid, Multi-User Workflows in the P-GRADE Grid Portal , 2005, Journal of Grid Computing.

[8]  Soonwook Hwang,et al.  Grid workflow: a flexible failure handling framework for the grid , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[9]  Jun Qin,et al.  ASKALON: a Grid application development and computing environment , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[10]  Ewa Deelman,et al.  Integrating existing scientific workflow systems: the Kepler/Pegasus example , 2007, WORKS '07.

[11]  Fabrizio Silvestri,et al.  Biological Experiments on the Grid: A Novel Workflow Management Platform , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[12]  Bernd Schuller,et al.  Chemomentum - UNICORE 6 Based Infrastructure for Complex Applications in Science and Technology , 2007, Euro-Par Workshops.

[13]  Ian J. Taylor,et al.  The Triana Workflow Environment: Architecture and Applications , 2007, Workflows for e-Science, Scientific Workflows for Grids.