Algorithms for reliability-oriented module allocation in distributed computing systems

Abstract We consider the problem of finding an allocation of program modules onto processors of a distributed computing system such that the reliability of successfully executing these modules is maximized. The distributed system consists of a number of processors interconnected by means of communication links. Certain constraints such as storage and load limits may be present at each processor. At any point of time, each component of the distributed system (processor or communication link) can exist in either of two states — operational or failed. The probability of a component being operational is given. A program module can be executed on any one of a set of processors. For execution, it requires access to certain data files. If a particular file it requires is not available locally, it has to access the file remotely, and for the remote access to be possible, at least one path (sequence of links and processors) from the processor at which the program module is executing, to one of the processors where the required file is available, must be operational. To improve reliability, there may be multiple copies of certain files, dispersed at various processors. Our aim is to allocate the program modules to processors in a manner that maximizes the probability of it being able to successfully access all the files it requires for execution, and the allocation should not violate any of the constraints. This problem is known to be NP-hard. We use a state space search technique — the A ∗ algorithm to obtain an optimal allocation. We also present a heuristic algorithm which obtains sub-optimal allocations in a reasonable amount of computation time. Through simulations over a wide range of parameters, we demonstrate the effectiveness of our approach.

[1]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[2]  Salim Hariri,et al.  Distributed Functions Allocation for Reliability and Delay Optimization , 1986, FJCC.

[3]  Kang G. Shin,et al.  Optimal Scheduling of Cooperative Tasks in a Distributed System Using an Enumerative Method , 1993, IEEE Trans. Software Eng..

[4]  Virginia Mary Lo,et al.  Heuristic Algorithms for Task Assignment in Distributed Systems , 1988, IEEE Trans. Computers.

[5]  Maw-Sheng Chern,et al.  An LC Branch-and-Branch Algorithm for the Module Assignment Problem , 1989, Inf. Process. Lett..

[6]  Wen-Hsiang Tsai,et al.  Optimal assignment of task modules with precedence for distributed processing by graph matching and state-space search , 1988, BIT.

[7]  Shahid H. Bokhari,et al.  Dual Processor Scheduling with Dynamic Reassignment , 1979, IEEE Transactions on Software Engineering.

[8]  C. Murray Woodside,et al.  Fast Allocation of Processes in Distributed and Parallel Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  Harold S. Stone,et al.  Critical Load Factors in Two-Processor Distributed Systems , 1978, IEEE Transactions on Software Engineering.

[10]  Viktor K. Prasanna,et al.  Distributed program reliability analysis , 1986, IEEE Transactions on Software Engineering.

[11]  J.-P. Wang,et al.  Task Allocation for Maximizing Reliability of Distributed Computer Systems , 1992, IEEE Trans. Computers.

[12]  Salim Hariri,et al.  SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods , 1987, IEEE Transactions on Computers.

[13]  James B. Sinclair,et al.  Efficient Computation of Optimal Assignments for Distributed Tasks , 1987, J. Parallel Distributed Comput..

[14]  Kemal Efe,et al.  Heuristic Models of Task Assignment Scheduling in Distributed Systems , 1982, Computer.

[15]  Chien-Chung Shen,et al.  A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minimax Criterion , 1985, IEEE Trans. Computers.