DAWGS - A Distributed Compute Server Utilizing Idle Workstations

Abstract A collection of powerful workstations interconnected by a local area network forms a large computing resource. The problem of locating and efficiently using this resource has been the subject of much study. When the system is composed of workstations, an attractive technique may be employed to make use of workstations left idle by their owners. The Distributed Automated Workload balancinG System (DAWGS) is designed to allow users to utilize this networked computing power for their programs. Essentially, DAWGS is an interface between the user and the kernel which allows users to submit batch-type or interactive-type processes or jobs for execution on an idle workstation somewhere on a local area network. DAWGS uses a distributed scheduler based on a bidding scheme which resolves many of the problems with bidding to determine which machine to run a process on. It properly redirects all I/O from the remotely running process back to the machine from whence the process came. DAWGS is capable of checkpointing processes and restarting any type of process, including interactive ones, even when the restart is on a machine different than the one the process was previously running on. We show that running processes remotely on idle workstations can result in significantly lower execution times, particularly for processes with a large execution time. Our method is different from previous work in that it is fault-tolerant, maintains total remote execution transparency for the user, and is fully distributed.

[1]  John A. Stankovic,et al.  An Adaptive Bidding Algorithm For Processes, Clusters and Distributed Groups , 1984, ICDCS.

[2]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[3]  Chong-Wei Xu,et al.  A Distributed Drafting Algorithm for Load Balancing , 1985, IEEE Transactions on Software Engineering.

[4]  Leonard Kleinrock,et al.  Collecting Unused Processing Capacity: An Analysis of Transient Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[5]  Kemal Efe,et al.  Minimizing control overheads in adaptive load sharing , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[6]  M. Litzkow REMOTE UNIX TURNING IDLE WORKSTATIONS INTO CYCLE SERVERS , 1992 .

[7]  Benjamin W. Wah,et al.  Implementation of GAMMON: An Efficient Load Balancing Strategy for a Local Computer System , 1989, ICPP.

[8]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[9]  Amnon Barak,et al.  A distributed load‐balancing policy for a multicomputer , 1985, Softw. Pract. Exp..

[10]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[11]  Marvin Theimer,et al.  Finding idle machines in a workstation-based distributed system , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[12]  Edward R. Zayas,et al.  Attacking the process migration bottleneck , 1987, SOSP '87.

[13]  Robert D. Silverman,et al.  A Distributed Batching System for Parallel Processing , 1989, Softw. Pract. Exp..

[14]  Robert B. Hagmann,et al.  Process Server: Sharing Processing Power in a Workstation Environment , 1986, ICDCS.

[15]  David A. Nichols,et al.  Using idle workstations in a shared computing environment , 1987, SOSP '87.

[16]  Craig E. Wills,et al.  A service execution mechanism for a distributed environment , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.