The performance of processor co-allocation in multicluster systems

In systems consisting of multiple clusters of processors interconnected by relatively slow communication links, co-allocation may be required. We study its performance by means of simulations, depending on the structure and sizes of jobs, and the communication speed ratio. We model a multicluster with C clusters of identical processors. The workload consists of rigid jobs that require fixed numbers of processors, possibly in multiple clusters simultaneously. A job is represented by a tuple of C values, each generated from a same distribution D. In an ordered request the positions of the components in the tuple specify the clusters from which the processors must be allocated. For an unordered request, by the components of the tuple the job only specifies the numbers of processors needed in the separate clusters. A flexible request specifies the total number of processors, obtained as the sum of the values in the tuple. For total requests, there is a single cluster and a request only specifies the total number of processors needed. All intracluster communication links have the same speed, as do all intercluster links. The communication speed ratio is the ratio between the time needed to complete a send operation between processors in different clusters and in the same cluster.

[1]  Uwe Schwiegelshohn,et al.  On Advantages of Grid Computing for Parallel Job Scheduling , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[2]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[3]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[4]  Kento Aida,et al.  Job Scheduling Scheme for Pure Space Sharing Among Rigid Jobs , 1998, JSSPP.

[5]  Mark J. Clement,et al.  The Performance Impact of Advance Reservation Meta-scheduling , 2000, JSSPP.

[6]  Henri E. Bal,et al.  Optimizing parallel applications for wide-area clusters , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[7]  Peter M. A. Sloot,et al.  The distributed ASCI Supercomputer project , 2000, OPSR.