The Reliability Analysis of Distributed Computing Systems with Imperfect Nodes

The reliability of a distributed computing system depends on the reliability of its communication links and nodes and on the distribution of its resources, such as programs and data files. Many algorithms have been proposed for computing the reliability of distributed computing systems, but they have been applied mostly to distributed computing systems with perfect nodes. However, in real problems, nodes as well as links may fail. This paper proposes two new algorithms for computing the reliability of a distributed computing system with imperfect nodes. Algorithm I is based on a symbolic approach that includes two passes of computation. Algorithm II employs a general factoring technique on both nodes and edges. Comparisons with existing methods show the usefulness of the proposed algorithms for computing the reliability of large distributed computing systems.

[1]  Noé Lopez-Benitez,et al.  Dependability Modeling and Analysis of Distributed Programs , 1994, IEEE Trans. Software Eng..

[2]  J. Carlier,et al.  Factoring and reductions for networks with imperfect vertices , 1991 .

[3]  Appajosyula Satyanarayana,et al.  A Linear-Time Algorithm for Computing K-Terminal Reliability in Series-Parallel Networks , 1985, SIAM J. Comput..

[4]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[5]  Deng-Jyi Chen,et al.  The Computational Complexity of the Reliability Problem on Distributed Systems , 1997, Inf. Process. Lett..

[6]  Salim Hariri,et al.  SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods , 1987, IEEE Transactions on Computers.

[7]  Deng-Jyi Chen,et al.  On Distributed Computing Systems Reliability Analysis Under Program Execution Constraints , 1994, IEEE Trans. Computers.

[8]  Viktor K. Prasanna,et al.  Distributed program reliability analysis , 1986, IEEE Transactions on Software Engineering.

[9]  Dharma P. Agrawal,et al.  On computer communication network reliability under program execution constraints , 1988, IEEE J. Sel. Areas Commun..

[10]  Mark K. Chang,et al.  Network reliability and the factoring theorem , 1983, Networks.

[11]  Michael O. Ball,et al.  Computational Complexity of Network Reliability Analysis: An Overview , 1986, IEEE Transactions on Reliability.

[12]  Dharma P. Agrawal,et al.  A generalized algorithm for evaluating distributed-program reliability , 1993 .

[13]  Deng-Jyi Chen,et al.  Reliability Analysis of Distributed Systems Based on a Fast Reliability Algorithm , 1992, IEEE Trans. Parallel Distributed Syst..

[14]  Deng-Jyi Chen,et al.  General Reduction Methods for the Reliability Analysis of Distributed Computing Systems , 1993, Comput. J..

[15]  S. G. Belovich A design technique for reliable networks under a nonuniform traffic distribution , 1995 .

[16]  Sheng-De Wang,et al.  Reliability evaluation for distributed computing networks with imperfect nodes , 1997 .