A comparison of local and gang scheduling on a Beowulf cluster

Gang scheduling and related techniques are widely believed to be necessary for efficient job scheduling on distributed memory parallel computers. This is because they minimize context switching overheads and permit the parallel job currently running to progress at the fastest possible rate. However, in the case of cluster computers, and particularly those with COTS networks, these benefits can be outweighed in the multiple jobs time-sharing context by the loss the ability to utilize the CPU for other jobs when the current job is waiting for messages. Experiments on a Linux Beowulf cluster with 100 Mb fast Ethernet switches are made comparing the SCore buddy-based gang scheduling with local scheduling (provided by the Linux 2.4 kernel with MPI implemented over TCP/IP). Results for communication-intensive numerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns ' up to a factor of two greater for 8 simultaneous jobs. This phenomenon is not due to any deficiencies in SCore but due to the relative costs of context switching versus message overhead, and we expect similar results holds for any gang scheduling implementation. A performance analysis of local scheduling indicates that cache pollution due to context switching is more significant than the direct context switching overhead on the applications studied. When this is taken into account, local scheduling behaviour comes close to achieving ideal slowdowns for finer-grained computations such as Linpack. The performance models also indicate that similar trends are to be expected for clusters with faster networks.

[1]  Richard P. Brent,et al.  Job Scheduling Strategies for Networks of Workstations , 1998, JSSPP.

[2]  Isaac D. Scherson,et al.  Improving Parallel Job Scheduling Using Runtime Measurements , 2000, JSSPP.

[3]  Uwe Schwiegelshohn,et al.  Fairness in parallel job scheduling , 2000 .

[4]  Yutaka Ishikawa,et al.  Implementation of Gang-Scheduling on Workstation Cluster , 1996, JSSPP.

[5]  Peter E. Strazdins,et al.  The analysis and optimization of collective communications on a Beowulf cluster , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[6]  Dror G. Feitelson,et al.  Gang scheduling with memory considerations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[7]  P. Strazdins A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization , 1998 .

[8]  Dror G. Feitelson,et al.  Metrics for Parallel Job Scheduling and Their Convergence , 2001, JSSPP.

[9]  Dror G. Feitelson,et al.  Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[10]  Larry Rudolph,et al.  Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control , 1996, J. Parallel Distributed Comput..

[11]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[12]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..