cient algorithms for reliability analysis of distributed computing systems

A distributed computing system is modeled as a collection of resources (e.g. processing elements, data ®les and programs) interconnected via an arbitrary communication network and controlled by a distributed operating system. The distributed program reliability in a distributed computing system is the probability of successful execution of a program running on multiple processing elements and needs to retrieve data ®les from other processing elements. This reliability varies according to (1) the topology of the distributed computing system, (2) the reliability of the communication edges, (3) the data ®les and programs distribution among processing elements and (4) the data ®les required to execute a program. In addition, computing the reliability of distributed computing systems is #P-complete even when the distributed computing system is restricted to a series-parallel, a 2-tree, a tree, or a star structure. This paper presents ecient algorithms for computing the reliability of a distributed program running on other restricted classes of networks. Ó 1999 Elsevier Science Inc. All rights reserved.