Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources

Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with perfect efficiency. Obviously, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job's granularity and communication activity, and using gang scheduling only for those jobs that require it. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully implemented as part of the STORM resource manager, and shown to be competitive with gang scheduling and implicit coscheduling.

[1]  Dror G. Feitelson,et al.  Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[2]  Dan Tsafrir,et al.  Effects of Clock Resolution on the Scheduling of Real-Time and Interactive Processes , 2001 .

[3]  Victor Lee,et al.  Implications of I/O for Gang Scheduled Workloads , 1997, JSSPP.

[4]  Andrea C. Arpaci-Dusseau,et al.  Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.

[5]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[6]  David J. Lilja,et al.  Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs , 1998, CANPC.

[7]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[8]  Dan Tsafrir,et al.  Effects of clock resolution on the scheduling of interactive and soft real-time processes , 2003, SIGMETRICS '03.

[9]  Wu-chun Feng,et al.  Buffered coscheduling: a new methodology for multitasking parallel jobs on distributed systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[10]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[11]  Scott Pakin,et al.  STORM: Lightning-Fast Resource Management , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  Adolfy Hoisie,et al.  Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[13]  Scott Pakin,et al.  Dynamic Coscheduling on Workstation Clusters , 1998, JSSPP.

[14]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[15]  Scott Pakin,et al.  Identifying and Eliminating the Performance Variability on the ASCI Q Machine , 2003 .

[16]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.