Reliability Analysis of Distributed Systems Based on a Fast Reliability Algorithm

The reliability of a distributed processing system (DPS) can be expressed by the analysis of distributed program reliability (DPR) and distributed system reliability (DSR). One of the good approaches to formulate these reliability performance indexes is to generate all disjoint file spanning trees (FSTs) in the DPS graph such that the DPR and DSR can be expressed by the probability that at least one of these FSTs is working. In the paper, a unified algorithm to efficiently generate disjoint FSTs by cutting different links is presented, and the DPR and DSR are computed based on a simple and consistent union operation on the probability space of the FSTs. The DPS reliability related problems are also discussed. For speeding up the reliability evaluation, nodes merged, series, and parallel reduction concepts are incorporated in the algorithm. Based on the comparison of number of subgraphs (or FSTs) generated by the proposed algorithm and by existing evaluation algorithms, it is concluded that the proposed algorithm is much more economic in terms of time and space than the existing algorithms. >

[1]  John A. Stankovic,et al.  A Perspective on Distributed Computer Systems , 1984, IEEE Transactions on Computers.

[2]  Suresh Rai,et al.  Reliability Evaluation in Computer-Communication Networks , 1981, IEEE Transactions on Reliability.

[3]  Jacob A. Abraham,et al.  Load Redistribution Under Failure in Distributed Systems , 1983, IEEE Transactions on Computers.

[4]  Dharma P. Agrawal,et al.  On computer communication network reliability under program execution constraints , 1988, IEEE J. Sel. Areas Commun..

[5]  Luigi Fratta,et al.  A Recursive Method Based on Case Analysis for Computing Network Terminal Reliability , 1978, IEEE Trans. Commun..

[6]  Luigi Fratta,et al.  Synthesis of Available Networks , 1976, IEEE Transactions on Reliability.

[7]  E. Hansler A Fast Recursive Algorithm to Calculate the Reliability of a Communication Network , 1972 .

[8]  Butler W. Lampson,et al.  Distributed Systems — Architecture and Implementation , 1982, Lecture Notes in Computer Science.

[9]  A. Prabhakar,et al.  New Topological Formula and Rapid Algorithm for Reliability Analysis of Complex Networks , 1978, IEEE Transactions on Reliability.

[10]  David A. Rennels Distributed Fault-Tolerant Computer Systems , 1980, Computer.

[11]  Salim Hariri,et al.  RELIABILITY MEASURES FOR DISTRIBUTED PROCESSING SYSTEMS. , 1985 .

[12]  Avinash Agrawal,et al.  A Survey of Network Reliability and Domination Theory , 1984, Oper. Res..

[13]  A. Satyanarayana,et al.  New Topological Formula and Rapid Algorithm for Reliability Analysis of Complex Networks , 1978 .

[14]  Viktor K. Prasanna,et al.  Distributed program reliability analysis , 1986, IEEE Transactions on Software Engineering.

[15]  Viktor K. Prasanna,et al.  Reliability Analysis in Distributed Systems , 1988, IEEE Trans. Computers.

[16]  A. Satyanarayana,et al.  A Unified Formula for Analysis of Some Network Reliability Problems , 1982, IEEE Transactions on Reliability.

[17]  Philip H. Enslow What is a "Distributed" Data Processing System? , 1978, Computer.

[18]  Hector Garcia-Molina,et al.  Reliability issues for fully replicated distributed databases , 1982, Computer.