Extending Proportional-Share Scheduling to a Network of Workstation

As networks of workstations (NOW) emerge as a viable platform for a wide range of workloads, a new scheduling approach is needed to allocate the collection of resources across competing users. In this paper, we show that extensions to a proportional-share scheduler for improving response time can still fairly allocate resources to a mix of sequential, interactive, and parallel jobs in this distributed environment. We find that a proportional-share scheduler, specifically a stride-scheduler, running on each node in the cluster is a good building-block. Simple extensions are implemented and analyzed which provide better response-times for interactive jobs by giving those jobs their share of resources over a longer time-interval. When scheduling jobs across the cluster, we show that fairness can be guaranteed if each local scheduler knows the number of tickets issued to each user and if the tickets are balanced across all workstations. Finally, we show that a proportional-share of resources can be provided to timeshared parallel applications through a combination of stridescheduling and implicit coscheduling.

[1]  Andrea C. Arpaci-Dusseau,et al.  The interaction of parallel and sequential workloads on a network of workstations , 1995, SIGMETRICS '95/PERFORMANCE '95.

[2]  Michael Mitzenmacher,et al.  Load balancing and density dependent jump Markov processes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[3]  Anoop Gupta,et al.  Hive: fault containment for shared-memory multiprocessors , 1995, SOSP.

[4]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.

[5]  Harrick M. Vin,et al.  A hierarchial CPU scheduler for multimedia operating systems , 1996, OSDI '96.

[6]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[7]  David E. Culler,et al.  Active message applications programming interface and communication subsystem organization , 1995 .

[8]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[9]  Scott Pakin,et al.  Dynamic Coscheduling on Workstation Clusters , 1998, JSSPP.

[10]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[11]  Miron Livny,et al.  Scheduling Remote Processing Capacity in a Workstation-Processor Bank Network , 1987, ICDCS.

[12]  Donald F. Ferguson,et al.  Microeconomic algorithms for load balancing in distributed computer systems , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[13]  W. E. Weihl,et al.  An object-oriented framework for modular resource management , 1996, Proceedings of the Fifth International Workshop on Object-Orientation in Operation Systems.

[14]  Hussein M. Abdel-Wahab,et al.  A proportional share resource allocation algorithm for real-time, time-shared systems , 1996, 17th IEEE Real-Time Systems Symposium.

[15]  Fred Douglis,et al.  Transparent process migration: Design alternatives and the sprite implementation , 1991, Softw. Pract. Exp..

[16]  Bryan Ford,et al.  CPU inheritance scheduling , 1996, OSDI '96.

[17]  Marvin Theimer,et al.  Finding idle machines in a workstation-based distributed system , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[18]  Carl A. Waldspurger,et al.  Stride Scheduling: Deterministic Proportional- Share Resource Management , 1995 .

[19]  Andrea C. Arpaci-Dusseau,et al.  Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[20]  Victor Lee,et al.  Implications of I/O for Gang Scheduled Workloads , 1997, JSSPP.

[21]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[22]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[23]  Joseph L. Hellerstein,et al.  Achieving Service Rate Objectives with Decay Usage Scheduling , 1993, IEEE Trans. Software Eng..

[24]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.

[25]  Marvin Theimer,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP '85.

[26]  S. Zhou,et al.  A Trace-Driven Simulation Study of Dynamic Load Balancing , 1987, IEEE Trans. Software Eng..

[27]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[28]  Hussein M. Abdel-Wahab,et al.  A Microeconomic Scheduler for Parallel Computers , 1995, JSSPP.

[29]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[30]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[31]  Tad Hogg,et al.  Spawn: A Distributed Computational Economy , 1992, IEEE Trans. Software Eng..

[32]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[33]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[34]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[35]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[36]  Judy Kay,et al.  A fair share scheduler , 1988, CACM.

[37]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[38]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .

[39]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.