Graph-Based Task Replication for Workflow Applications

The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replication to overcome failures in a set of independent tasks, and in workflow applications, but they do not consider possible resource limitations when scheduling the replicas. In this paper, we focus on the use of task replication techniques for workflow applications, trying to achieve not only tolerance to the possible failures in an execution, but also to speed up the computation without demanding the user to implement an application-level checkpoint, which may be a difficult task depending on the application. Moreover, we also study what to do when there are not enough resources for replicating all running tasks. We establish different priorities of replication depending on the graph of the workflow application, giving more priority to tasks with a higher output degree. We have implemented our proposed policy in the GRID superscalar system, and we have run the fastDNAml as an experiment to prove our objectives are reached. Finally, we have identified and studied a problem which may arise due to the use of replication in workflow applications: the replication wait time.

[1]  Péter Kacsuk,et al.  P-GRADE: A Grid Programming Environment , 2003, Journal of Grid Computing.

[2]  Erik Maehle,et al.  Fault-Tolerant Dynamic Task Scheduling Based on Dataflow Graphs , 1998 .

[3]  Daniel A. Reed,et al.  Fault Tolerance and Recovery of Scientific Workflows on Computational Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[4]  Lavanya Ramakrishnan,et al.  Performability modeling for scheduling and fault tolerance strategies for scientific workflows , 2008, HPDC '08.

[5]  Yaohang Li,et al.  Improving performance via computational replication on a large-scale computational grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[6]  Matthias S. Müller,et al.  A global grid for analysis of arthropod evolution , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[7]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[8]  Jemal H. Abawajy,et al.  Fault-tolerant scheduling policy for grid computing systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  Rosa M. Badia,et al.  Fault Tolerance features in GRID superscalar , 2007 .

[10]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[11]  Jesús Labarta,et al.  Implementing phylogenetic inference with GRID superscalar , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[12]  Soonwook Hwang,et al.  Grid workflow: a flexible failure handling framework for the grid , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[13]  H. Howie Huang,et al.  A highly available job execution service in computational service market , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[14]  Jun Qin,et al.  ASKALON: a Grid application development and computing environment , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[15]  Luís Moura Silva,et al.  System-level versus user-defined checkpointing , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[16]  Dimitrios Skoutas,et al.  Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments , 2007, Future Gener. Comput. Syst..

[17]  Jesús Labarta,et al.  Programming Grid Applications with GRID Superscalar , 2003, Journal of Grid Computing.

[18]  Eduardo Huedo,et al.  A framework for adaptive execution in grids , 2004, Softw. Pract. Exp..

[19]  Thilo Kielmann,et al.  A Service for Reliable Execution of Grid Applications , 2006, CoreGRID Integration Workshop.