A decentralized fault tolerance model based on level of performance for grid environment

Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. At this scale, computer resources and network failures are no more exceptions, but belong to the normal system behavior. Therefore, one of the most valuable characteristics of grid tools, apart from the performance they can achieve, is fault tolerance, which is a significant and complex issue in grid computing systems. In this paper, we propose a fault tolerant model for grid computing systems namely DCFT. This model is based on dynamic colored graphs without replication of computer resources. The proposed faut tolerance model consists of two stages. In the first stage, each node is described by a state vector. We color each attribute of the state vector as three colors (green, blue and red) based on its level of performance. In the second stage, we classify the nodes of a grid into three categories: the identical computer resources in term of performance, the more efficient ones and the less efficient ones. We used the colors of the nodes to develop a new strategy for fault tolerance based on the level of performance. A simulation of the proposed model using SimGrid simulator and Graphstream is conducted. Experimental results show that the proposed model performs very well in a large grid environment.

[1]  Martin A. Nowak,et al.  Evolutionary dynamics on graphs , 2005, Nature.

[2]  Yunni Xia,et al.  A Novel Failure Detection Algorithm for Reliable Distributed Systems , 2011, J. Comput..

[3]  Sajal K. Das,et al.  Design and Performance of a Heterogeneous Grid Partitioner , 2006, Algorithmica.

[4]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[5]  Ian T. Foster,et al.  The Globus Replica Location Service: Design and Experience , 2009, IEEE Transactions on Parallel and Distributed Systems.

[6]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[7]  Zibin Zheng,et al.  Component Ranking for Fault-Tolerant Cloud Applications , 2012, IEEE Transactions on Services Computing.

[8]  Antoine Dutot,et al.  GraphStream: A Tool for bridging the gap between Complex Systems and Dynamic Graphs , 2008, ArXiv.

[9]  Gregory Levitin,et al.  Service reliability and performance in grid system with star topology , 2007, Reliab. Eng. Syst. Saf..

[10]  Abhishek Chandra,et al.  Ridge: combining reliability and performance in open grid platforms , 2007, HPDC '07.

[11]  Patricia González,et al.  Application-Level Fault-Tolerance Solutions for Grid Computing , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[12]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[13]  Raju Nedunchezhian,et al.  A hybrid policy for fault tolerant load balancing in grid computing environments , 2012, J. Netw. Comput. Appl..

[14]  Mohamed Jemni,et al.  Controlling processing usage at user level: a way to make resource sharing more flexible , 2010 .

[15]  Jasma Balasangameshwara,et al.  A Fault Tolerance Optimal Neighbor Load Balancing Algorithm for Grid Environment , 2010, 2010 International Conference on Computational Intelligence and Communication Networks.

[16]  Rajkumar Buyya,et al.  CycloidGrid: A proximity-aware P2P-based resource discovery architecture in volunteer computing systems , 2013, Future Gener. Comput. Syst..

[17]  K. Mani Chandy,et al.  Distributed computation on graphs: shortest path algorithms , 1982, CACM.

[18]  Ying Zhang,et al.  Integrating Resource Consumption and Allocation for Infrastructure Resources on-Demand , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[19]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[20]  Abhishek Chandra,et al.  Adaptive Reputation-Based Scheduling on Unreliable Distributed Infrastructures , 2007, IEEE Transactions on Parallel and Distributed Systems.

[21]  Zaher Mahjoub,et al.  On a parallel genetic-tabu search based algorithm for solving the graph colouring problem , 2009, Eur. J. Oper. Res..

[22]  Sajal K. Das,et al.  Graph partitioning for parallel applications in heterogeneous Grid environments , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Malek Mouhoub,et al.  An efficient hierarchical parallel genetic algorithm for graph coloring problem , 2011, GECCO '11.

[24]  Aliaa A. A. Youssif,et al.  An Efficient Decentralized Grid Service Advertisement Approach Using Multi-Agent System , 2010, Comput. Inf. Sci..

[25]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[26]  Yahya Slimani,et al.  Dynamic Hierarchical Model for Fault Tolerant Grid Computing - TI Journals , 2012 .

[27]  Eduardo Huedo,et al.  Evaluating the reliability of computational grids from the end user's point of view , 2006, J. Syst. Archit..

[28]  Samar Sen Sarma,et al.  CCTP, Graph Coloring Algorithms - Soft Computing Solutions , 2007, 6th IEEE International Conference on Cognitive Informatics.

[29]  Yi Pan,et al.  A Hierarchical Modeling and Analysis for Grid Service Reliability , 2007, IEEE Transactions on Computers.

[30]  Gilles Fedak,et al.  XtremLab: A System for Characterizing Internet Desktop Grids , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[31]  Aniel,et al.  G RAPH COLORING PROBLEMS AND THEIR APPLICATIONS IN SCHEDULING , 2004 .

[32]  Shangguang Wang,et al.  QSSA: A QoS-aware Service Selection Approach , 2011, Int. J. Web Grid Serv..

[33]  Qiang Yang,et al.  EigenRank: a ranking-oriented approach to collaborative filtering , 2008, SIGIR '08.

[34]  Sajal K. Das,et al.  A de-centralized scheduling and load balancing algorithm for heterogeneous grid environments , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[35]  Juan Touriño,et al.  A grid portal for an undergraduate parallel programming course , 2005, IEEE Transactions on Education.

[36]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[37]  Hai Jin,et al.  DRIC: Dependable Grid Computing Framework , 2006, IEICE Trans. Inf. Syst..

[38]  Gilles Fedak,et al.  SpeQuloS: a QoS service for hybrid and elastic computing infrastructures , 2014, Cluster Computing.

[39]  Mohammad Hossein Yaghmaee Moghaddam,et al.  Proximity-Aware Resource Discovery Architecture in Peer-to-Peer Based Volunteer Computing System , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[40]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[41]  Ritu Garg,et al.  Fault TOLERANCE IN GRID COMPUTING : STATE OF THE ART AND OPEN ISSUES , 2011 .

[42]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.