A heuristic task assignment algorithm to maximize reliability of a distributed system

Distributed systems potentially provide high reliability owing to the program and data-file redundancy possible. In many applications, high reliability is the major consideration for system design. Previous work has shown that the distribution of programs and data-files can affect the system reliability appreciably, and that redundancy in resources such as computers, programs, and data-files can improve the reliability of a distributed system. This work formulates a practical application for a reliability-oriented distributed task assignment problem which is NP-hard. Then, to cope with this challenging problem, a greedy algorithm is proposed, based on some heuristics, to find an approximate solution. The simulation shows that, in most cases tested, the algorithm finds suboptimal solutions efficiently; therefore, it is a desirable approach to solve these problems. >

[1]  John A. Stankovic,et al.  A Perspective on Distributed Computer Systems , 1984, IEEE Transactions on Computers.

[2]  Michael O. Ball,et al.  Complexity of network reliability computations , 1980, Networks.

[3]  Viktor K. Prasanna,et al.  Distributed program reliability analysis , 1986, IEEE Transactions on Software Engineering.

[4]  Frank A. Tillman,et al.  System-Reliability Evaluation Techniques for Complex/Large SystemsߞA Review , 1981, IEEE Transactions on Reliability.

[5]  S. M. Shatz,et al.  Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems , 1989 .

[6]  Avinash Agrawal,et al.  A Survey of Network Reliability and Domination Theory , 1984, Oper. Res..

[7]  David B. Brown,et al.  A Computerized Algorithm for Determining the Reliability of Redundant Configurations , 1971 .

[8]  Suresh Rai,et al.  Reliability Evaluation in Computer-Communication Networks , 1981, IEEE Transactions on Reliability.

[9]  Salim Hariri,et al.  RELIABILITY MEASURES FOR DISTRIBUTED PROCESSING SYSTEMS. , 1985 .

[10]  A. Satyanarayana,et al.  A Unified Formula for Analysis of Some Network Reliability Problems , 1982, IEEE Transactions on Reliability.

[11]  Viktor K. Prasanna,et al.  Reliability Analysis in Distributed Systems , 1988, IEEE Trans. Computers.

[12]  Salim Hariri,et al.  SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods , 1987, IEEE Transactions on Computers.

[13]  Salim Hariri,et al.  Distributed Functions Allocation for Reliability and Delay Optimization , 1986, FJCC.

[14]  Salim Hariri,et al.  Reliability Optimization in the Design of Distributed Systems , 1985, IEEE Transactions on Software Engineering.

[15]  K. B. Misra,et al.  An Algorithm for the Reliability Evaluation of Redundant Networks , 1970 .

[16]  Philip H. Enslow What is a "Distributed" Data Processing System? , 1978, Computer.

[17]  David A. Rennels Distributed Fault-Tolerant Computer Systems , 1980, Computer.

[18]  S. Rai,et al.  An Efficient Method for Reliability Evaluation of a General Network , 1978, IEEE Transactions on Reliability.