A protocol for load sharing among a cluster of heterogeneous Unix workstations

In this paper we propose a protocol for load sharing among a cluster of heterogeneous Unix workstations. Our protocol called the distributed process management protocol (DPMP), not only enables load sharing using nonpreemptive process migration but also seamlessly integrates the processes running on a network of machines. Remote processes can be accessed (for signalling, for example), in the same way as local processes making process migration highly transparent to the users and the applications. DPMP also has built in mechanisms to detect and recover from node and network failures. DPMP can be implemented at either the kernel or the user level. We also describe an implementation of DPMP within the Linux kernel. Preliminary performance studies show that the performance gains obtained by using DPMP are substantial.

[1]  Raphael A. Finkel,et al.  Designing a process migration facility: the Charlotte experience , 1989, Computer.

[2]  Fred Douglis,et al.  Transparent process migration: Design alternatives and the sprite implementation , 1991, Softw. Pract. Exp..

[3]  Keith A. Lantz,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP 1985.

[4]  David L. Black,et al.  An OSF/1 UNIX for Massively Parallel Multicomputers , 1993, USENIX Winter.

[5]  Fred Douglis Transparent process migration in the Sprite operating system , 1990 .

[6]  Peter Smith,et al.  Heterogeneous process migration: the Tui system , 1998, Softw. Pract. Exp..

[7]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[8]  Horst Langendörfer,et al.  Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes , 1995, OPSR.

[9]  Ken Shirriff,et al.  Building distributed process management on an object-oriented framework , 1997 .

[10]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998 .

[11]  D. Freedman Experience Building a Process Migration Subsystem for UNIX , 1991, USENIX Winter.

[12]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..

[13]  Vaidy S. Sunderam,et al.  Process Migration in UNIX Networks , 1988, USENIX Winter.

[14]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.