Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor

Clusters of commodity PC hardware running Linux are becoming widely used as computational resources. Most software for controlling clusters relies on dedicated scheduling algorithms. These algorithms assume the constant availability of resources to compute fixed schedules. Unfortunately, due to hardware and software failures, dedicated resources are not always available over the long-term. Moreover, these dedicated scheduling solutions are only applicable to certain classes of jobs, and they can only manage clusters or large SMP machines. The Condor High Throughput Computing System overcomes these limitations by combining aspects of dedicated and opportunistic scheduling into a single system. Both parallel and serial jobs are managed at the same time, allowing a simpler interface for the user and better resource utilization. This paper describes the Condor system, defines opportunistic scheduling, explains how Condor supports MPI jobs with a combination of dedicated and opportunistic scheduling, and shows the advantages gained by such an approach. An exploration of future work in these areas concludes the paper. By using both desktop workstations and dedicated clusters, Condor harnesses all available computational power to enable the best possible science at a low cost.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  M. Litzkow REMOTE UNIX TURNING IDLE WORKSTATIONS INTO CYCLE SERVERS , 1992 .

[3]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[4]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[5]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[6]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[7]  William Gropp,et al.  Users guide for mpich, a portable implementation of MPI , 1996 .

[8]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[9]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[10]  Miron Livny,et al.  Mechanisms for High Throughput Computing , 1997 .

[11]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[12]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[13]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[14]  William Gropp,et al.  MPI: The Complete Reference , Vol. 2 - The MPI-2 Extensions , 1998 .

[15]  Dror G. Feitelson,et al.  Utilization and Predictability in Scheduling the IBM SP2 with Backfilling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[16]  Zhou Lei,et al.  The portable batch scheduler and the maui scheduler on linux clusters , 2000 .

[17]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .