In grid computing, load balancing with fault tolerance is an important issue. Fault tolerance is an important property in Grid computing as the dependability of individual Grid resources may not be able to be guaranteed. Common fault tolerance techniques in distributed systems are normally achieved with checkpoint- recovery and task replication on alternative resources in cases of a system outage. Grid services are often expected to meet some minimum levels of service for a desirable operation. We proposed a fault tolerant load balancing model to address this issue. We designed and implemented a fault detector and manager in the existing Intra-cluster and Intra- grid load balancing model thereby making it a fault tolerant load balancing model. The performance of task execution was improved due to task migration using fault manager. The performance of our novel fault tolerance technique was compared to the checkpoint-recovery technique.
[1]
Daniel Marques,et al.
Optimizing checkpoint sizes in the C3 system
,
2005,
19th IEEE International Parallel and Distributed Processing Symposium.
[2]
Laxmikant V. Kalé,et al.
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
,
2004,
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[3]
B. Yagoubi,et al.
A load balancing model for grid environment
,
2007,
2007 22nd international symposium on computer and information sciences.
[4]
William G. Tuel,et al.
Parallel checkpoint/restart without message logging
,
2000,
Proceedings 2000. International Workshop on Parallel Processing.