Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments

Fault tolerant Grid computing is of vital importance as the Grid and Mobile computing worlds converge to the Mobile Grid computing paradigm. We present an efficient scheme based on task replication, which utilizes the Weibull reliability function for the Grid resources so as to estimate the number of replicas that are going to be scheduled in order to guarantee a specific fault tolerance level for the Grid environment. The additional workload that is produced by the replication is handled by a resource management scheme which is based on the knapsack formulation and which aims to maximize the utilization and profit of the Grid infrastructure. The proposed model has been evaluated through simulation and has shown its efficiency for being used in a middleware approach in future mobile Grid environments.

[1]  Gregor von Laszewski,et al.  QoS guided Min-Min heuristic for grid task scheduling , 2003, Journal of Computer Science and Technology.

[2]  J. Leung,et al.  A Note on Preemptive Scheduling of Periodic, Real-Time Tasks , 1980, Inf. Process. Lett..

[3]  大島 正嗣,et al.  Simple Object Access Protocol と,その応用としてのソフトウェアの組み合わせについて (渡邉昭夫教授退任記念号) , 2001 .

[4]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[5]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[6]  Konstantinos Tserpes,et al.  Computational workload prediction for grid oriented industrial applications: the case of 3D-image rendering , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[7]  Sartaj Sahni,et al.  Data Structures, Algorithms and Applications in Java , 1998 .

[8]  T. A. Varvarigou,et al.  Module replication for fault-tolerant real-time distributed systems , 1998 .

[9]  Richard L. Scheaffer,et al.  Introduction to Probability and Its Applications. , 1991 .

[10]  P. Sadayappan,et al.  Distributed job scheduling on computational Grids using multiple simultaneous requests , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[11]  Bruno R. Preiss,et al.  Data Structures and Algorithms with Object-Oriented Design Patterns in Java , 1999 .

[12]  Yang Gao,et al.  Adaptive grid job scheduling with genetic algorithms , 2005, Future Gener. Comput. Syst..

[13]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[14]  Andrew S. Grimshaw,et al.  Integrating fault-tolerance techniques in grid applications , 2000 .

[15]  Andrew S. Grimshaw,et al.  Using Reflection for Incorporating Fault-Tolerance Techniques into Distributed Applications , 1998, Parallel Process. Lett..

[16]  The binary knapsack problem: solutions with guaranteed quality , 2001 .

[17]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[18]  David Abramson,et al.  A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok , 2001, Future Gener. Comput. Syst..

[19]  Konstantinos Dolkas,et al.  A combined fuzzy-neural network model for non-linear prediction of 3-D rendering workload in Grid computing , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Robert B. Abernethy,et al.  The new Weibull handbook , 1993 .

[21]  Nikitas J. Dimopoulos,et al.  Resource management and knapsack formulations on the grid , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[22]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[23]  John E. Beasley,et al.  A Genetic Algorithm for the Multidimensional Knapsack Problem , 1998, J. Heuristics.

[24]  Joel H. Saltz,et al.  Parallel Programming Using C++ , 1996 .

[25]  Laura Carrington,et al.  A performance prediction framework for scientific applications , 2003, Future Gener. Comput. Syst..

[26]  Steven Tuecke,et al.  The Open Grid Services Architecture , 2004, The Grid 2, 2nd Edition.

[27]  Henri Casanova,et al.  Deploying fault tolerance and taks migration with NetSolve , 1999, Future Gener. Comput. Syst..

[28]  E. Costa,et al.  An Evolutionary Approach to the Zero/One Knapsack Problem: Testing Ideas from Biology , 2001 .

[29]  Konstantinos Dolkas,et al.  A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids , 2005, EGC.

[30]  Chuliang Weng,et al.  Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid , 2005, Future Gener. Comput. Syst..

[31]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[32]  Jon B. Weissman Fault tolerant computing on the grid: what are my options? , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[33]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[34]  Felix C. Freiling,et al.  Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments , 1999, ACM Comput. Surv..

[35]  Jie Liu,et al.  A scalable P2P platform for the knowledge grid , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36]  Carlos Cotta,et al.  A Hybrid Genetic Algorithm for the 0-1 Multiple Knapsack Problem , 1997, ICANNGA.

[37]  Xian-He Sun,et al.  Performance Modeling and Prediction of Nondedicated Network Computing , 2002, IEEE Trans. Computers.

[38]  Gregory V. Wilson,et al.  Parallel Programming Using C , 1996 .

[39]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[40]  Paul L. Meyer,et al.  Introductory Probability and Statistical Applications , 1970 .

[41]  Soonwook Hwang,et al.  A Flexible Framework for Fault Tolerance in the Grid , 2003, Journal of Grid Computing.

[42]  J. Gates Introduction to Probability and its Applications , 1992 .

[43]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[44]  Francine Berman,et al.  The GrADS Project: Software Support for High-Level Grid Application Development , 2001, Int. J. High Perform. Comput. Appl..

[45]  Charng-Da Lu,et al.  Reliability challenges in large systems , 2006, Future Gener. Comput. Syst..

[46]  Henri Casanova,et al.  Deploying Fault-Tolerance and Task Migration with NetSolve , 1998, PARA.

[47]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[48]  Qun Chen,et al.  FATCOP 2.0: Advanced Features in an Opportunistic Mixed Integer Programming Solver , 2001, Ann. Oper. Res..

[49]  David Pisinger,et al.  Algorithms for Knapsack Problems , 1995 .

[50]  S. Martello,et al.  Algorithms for Knapsack Problems , 1987 .

[51]  Krithi Ramamritham,et al.  Determining Redundancy Levels for Fault Tolerant Real-Time Systems , 1995, IEEE Trans. Computers.

[52]  Rudolf F. Albrecht,et al.  Artificial Neural Nets and Genetic Algorithms , 1995, Springer Vienna.

[53]  Paolo Toth,et al.  New trends in exact algorithms for the 0-1 knapsack problem , 2000, Eur. J. Oper. Res..

[54]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[55]  Algirdas Avizienis,et al.  Software Fault Tolerance , 1989, IFIP Congress.

[56]  Krithi Ramamritham,et al.  Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems , 1989, IEEE Trans. Parallel Distributed Syst..

[57]  Jack Dongarra,et al.  Application-specific tools , 1998 .