Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery

There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.

[1]  M. Zuo,et al.  Genetic-algorithm-based optimal apportionment of reliability and redundancy under multiple objectives , 2009 .

[2]  Simon P. Wilson,et al.  Software Reliability Modeling , 1994 .

[3]  Yi Pan,et al.  A Hierarchical Modeling and Analysis for Grid Service Reliability , 2007, IEEE Transactions on Computers.

[4]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[5]  Vittorio Maniezzo,et al.  Exact and Approximate Nondeterministic Tree-Search Procedures for the Quadratic Assignment Problem , 1999, INFORMS J. Comput..

[6]  Xiaolong Wang,et al.  Optimal resource allocation on grid systems for maximizing service reliability using a genetic algorithm , 2006, Reliab. Eng. Syst. Saf..

[7]  Min Xie,et al.  Software Reliability Modelling , 1991, Series on Quality, Reliability and Engineering Statistics.

[8]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[9]  Gregory Levitin,et al.  Optimal Resource Allocation for Maximizing Performance and Reliability in Tree-Structured Grid Services , 2007, IEEE Transactions on Reliability.

[10]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[11]  Jerry Y. H. Fuh,et al.  A multi-objective disassembly planning approach with ant colony optimization algorithm , 2008 .

[12]  Xiaolong Wang,et al.  Optimal task partition and distribution in grid service system with common cause failures , 2007, Future Gener. Comput. Syst..

[13]  Abdelsalam Helal,et al.  Reliability, Availability, Dependability and Performability: A User-centered View , 1997 .

[14]  Jie Xu,et al.  Fault Tolerance within a Grid Environment , 2003 .

[15]  Péter Kacsuk,et al.  A Migration Framework for Executing Parallel Programs in the Grid , 2004, European Across Grids Conference.

[16]  David W. Coit,et al.  Reliability optimization of series-parallel systems using a genetic algorithm , 1996, IEEE Trans. Reliab..

[17]  Kishor S. Trivedi,et al.  Fighting bugs: remove, retry, replicate, and rejuvenate , 2007, Computer.

[18]  Wang Bo,et al.  A fault-tolerance mechanism in grid , 2003, IEEE International Conference on Industrial Informatics, 2003. INDIN 2003. Proceedings..

[19]  Hong-Zhong Huang,et al.  Intelligent interactive multiobjective optimization method and its application to reliability optimization , 2005 .

[20]  Gregory Levitin,et al.  Service reliability and performance in grid system with star topology , 2007, Reliab. Eng. Syst. Saf..

[21]  Marco Dorigo,et al.  AntNet: Distributed Stigmergetic Control for Communications Networks , 1998, J. Artif. Intell. Res..

[22]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .

[23]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[24]  Alice E. Smith,et al.  An ant colony optimization algorithm for the redundancy allocation problem (RAP) , 2004, IEEE Transactions on Reliability.

[25]  David W. Coit,et al.  Practical solutions for multi-objective optimization: An application to system reliability design problems , 2007, Reliab. Eng. Syst. Saf..

[26]  Mitsuo Gen,et al.  Soft computing approach for reliability optimization: State-of-the-art survey , 2006, Reliab. Eng. Syst. Saf..

[27]  Peng-Yeng Yin,et al.  Multi-objective task allocation in distributed computing systems by hybrid particle swarm optimization , 2007, Appl. Math. Comput..

[28]  Y. Xie,et al.  Multicriterion Evolutionary Structural Optimization Using the Weighting and the Global Criterion Methods , 2001 .

[29]  Hong-Zhong Huang,et al.  Bayesian reliability analysis for fuzzy lifetime data , 2006, Fuzzy Sets Syst..

[30]  R. Marler,et al.  The weighted sum method for multi-objective optimization: new insights , 2010 .

[31]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[32]  Chun-Lin Li,et al.  Multiple QoS modeling and algorithm in computational grid * * The project was supported by the Natio , 2007 .

[33]  Gregory Levitin,et al.  Reliability and Performance of Star Topology Grid Service With Precedence Constraints on Subtask Execution , 2006, IEEE Transactions on Reliability.

[34]  Hartmut Schmeck,et al.  Ant colony optimization for resource-constrained project scheduling , 2000, IEEE Trans. Evol. Comput..

[35]  M. A. Ansari,et al.  Distributed Fault Management for Computational Grids , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing (GCC'06).

[36]  Marco Dorigo,et al.  Ant colony optimization theory: A survey , 2005, Theor. Comput. Sci..

[37]  Yuan-Shun Dai,et al.  Reliability of grid service systems , 2006, Comput. Ind. Eng..

[38]  Liudong Xing,et al.  A New Decision-Diagram-Based Method for Efficient Analysis on Multistate Systems , 2009, IEEE Transactions on Dependable and Secure Computing.