Checkpointing Based Fault Tolerant Job Scheduling System for Computational Grid

A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and application checkpointing. Their performance comparison is done based on the empirical study. The ABSC scheme is suitable for the applications where computations are not intense. But for computationally intense applications where reliability is more important ABAC scheme is more suitable. But this scheme may produce slight overheads in fault free situations and very reliable in faulty situations.

[1]  V. K. Rakesh,et al.  A resource selection strategy and check pointing to minimize computational time in case of grid resource failure , 2012, 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT).

[2]  Zhen Li,et al.  Research of Process Migration Mechanism Based on Checkpoint in Computational Grid , 2010, 2010 Fifth Annual ChinaGrid Conference.

[3]  Sebastián Reyes,et al.  Derivation of self-scheduling algorithms for heterogeneous distributed computer systems: Application to internet-based grids of computers , 2009, Future Gener. Comput. Syst..

[4]  H. Motameni,et al.  Task scheduling with Load balancing for computational grid using NSGA II with fuzzy mutation , 2012, 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing.

[5]  Francisco Vilar Brasileiro,et al.  Faults in grids: why are they so bad and what can be done about it? , 2003, Proceedings. First Latin American Web Congress.

[6]  A.E. El-Desoky,et al.  Improving Fault Tolerance in Desktop Grids Based On Incremental Checkpointing , 2006, 2006 International Conference on Computer Engineering and Systems.

[7]  Filip De Turck,et al.  Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.

[8]  M. Amoon Design of a Fault-Tolerant Scheduling System for Grid Computing , 2011, 2011 Second International Conference on Networking and Distributed Computing.

[9]  Manoj Misra,et al.  Incorporating fault tolerance in GA-based scheduling in grid environment , 2011, 2011 World Congress on Information and Communication Technologies.

[10]  Rohaya Latip,et al.  Checkpointing in selected most fitted resource task scheduling in grid computing , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[11]  Hong Chen,et al.  Optimizing Adaptive Checkpointing Schemes for Grid Workflow Systems , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing Workshops.

[12]  G. Sumathi,et al.  Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing , 2010 .

[13]  Abdul Hanan Abdullah,et al.  Fault-Tolerance Scheduling by Using Rough Set Based Multi-checkpointing on Economic Grids , 2009, 2009 International Conference on Computational Science and Engineering.

[14]  R. K. Bawa,et al.  Comparative Analysis of Fault Tolerance Techniques in Grid Environment , 2012 .

[15]  Janki Mehta,et al.  Checkpointing and Recovery Mechanism in Grid , 2008, 2008 16th International Conference on Advanced Computing and Communications.

[16]  Chandrasekaran Subramaniam,et al.  On demand check pointing for grid application reliability using communicating process model , 2011, 13th International Conference on Advanced Communication Technology (ICACT2011).

[17]  R. K. Bawa,et al.  Application checkpointing in grid environment with improved checkpoint reliability through replication , 2012, 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12).

[18]  M. Prakash,et al.  Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).