Using Monte Carlo Simulation in Grid Computing Systems for Reliability Estimation

Grid Computing Systems (GCS) have been suggested as an effective technology in geographically distributed resource coupling for applications that require large space for computations and resources. Reliability has proved to be one of the most important criteria in grid systems. In such systems, dynamic access to the required resources is complex, and therefore achieving reliability tends to be very difficult. In this paper we propose an algorithm to estimating reliability of programs and grid system based on Monte Carlo simulation. Our suggested algorithm considers the transmission rate of data between nodes through links and processing time on nodes to estimate the reliability of involved nodes and links which requires less time and complexity for running and it is appropriate for GCS.

[1]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[2]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[3]  Deng-Jyi Chen,et al.  Reliability Analysis of Distributed Systems Based on a Fast Reliability Algorithm , 1992, IEEE Trans. Parallel Distributed Syst..

[4]  Chung-Chi Hsieh,et al.  Reliability and cost optimization in distributed computing systems , 2003, Comput. Oper. Res..

[5]  Dharma P. Agrawal,et al.  On computer communication network reliability under program execution constraints , 1988, IEEE J. Sel. Areas Commun..

[6]  Qusay H. Mahmoud,et al.  Monte Carlo simulation-based algorithms for estimating the reliability of mobile agent-based systems , 2008, J. Netw. Comput. Appl..

[7]  V. A. Netes,et al.  Consideration of node failures in network-reliability calculation , 1996, IEEE Trans. Reliab..

[8]  Ruey-Shun Chen,et al.  A heuristic approach to generating file spanning trees for reliability analysis of distributed computing systems , 1997 .

[9]  Yuan-Shun Dai,et al.  Modeling and analysis of correlated software failures of multiple types , 2005, IEEE Trans. Reliab..

[10]  Yuan-Shun Dai,et al.  A model for availability analysis of distributed software/hardware systems , 2002, Inf. Softw. Technol..

[11]  Héctor Cancela,et al.  A recursive variance-reduction algorithm for estimating communication-network reliability , 1995 .

[12]  Alice E. Smith,et al.  Reliability optimization of computer communication networks using genetic algorithms , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[13]  Steven Tuecke,et al.  Enabling Scalable Virtual Organizations , 2001 .

[14]  Yuan-Shun Dai,et al.  Reliability analysis of grid computing systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..