A process migration subsystem for a workstation-based distributed systems

Workstation based distributed computing environments are getting popular in both academic and commercial communities, due to the continuing trend of decreasing cost/performance ratio and rapid development of networking technology. However, the workload on these workstations is usually much lower than their computing capacity, especially with the ever increasing computing power of new hardware. As a result, the resources of such workstations are often under utilized and many of them are frequently idle. A preemptive process migration facility can be provided, in such a distributed system, to dynamically relocate running processes among the component machines. Such relocation can help cope with dynamic fluctuations in loads and service needs, improve the system's fault tolerance, meet real time scheduling deadlines, or bring a process to a special device. The paper presents a process migration subsystem for tolerating process and node failures on a workstation based environment. The design and implementation of the subsystem are also discussed.

[1]  Bruce M. McMillin,et al.  DAWGS - A Distributed Compute Server Utilizing Idle Workstations , 1992, J. Parallel Distributed Comput..

[2]  David Powell,et al.  Distributed fault tolerance: lessons from Delta-4 , 1994, IEEE Micro.

[3]  Raphael A. Finkel,et al.  Designing a Process Migration Facility , 1989 .

[4]  Muslim Bozyigit,et al.  Design of a Load Balancing Framework for Distributed and Parallel Applications , 1995, PDPTA.

[5]  M. Litzkow REMOTE UNIX TURNING IDLE WORKSTATIONS INTO CYCLE SERVERS , 1992 .

[6]  Keith A. Lantz,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP 1985.

[7]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  Injong Jee Optimal fault-tolerant resource allocation in dynamic distributed systems , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[9]  David A. Nichols,et al.  Using idle workstations in a shared computing environment , 1987, SOSP '87.

[10]  Niraj K. Jha,et al.  Task Allocation for Safety and Reliability in Distributed Systems , 1995, ICPP.

[11]  Arun K. Somani,et al.  Efficient utilization of spare capacity for fault detection and location in multiprocessor systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[12]  Yaoshuang Qu,et al.  Fault tolerance in the execution of remote jobs on idling workstations , 1995, Concurr. Pract. Exp..