A Study on Job Co-Allocation in Multiple HPC Clusters

To more effectively use HPC clusters for even larger computations, improve turn-around times and better utilize compute resource, users are looking to interconnect multiple HPC clusters, creating a grid. To effectively use such grids, it may be desirable to split and co-allocate jobs requiring many processes across multiple clusters. While splitting a very large job across multiple clusters is an attractive possibility, the benefit, in terms of improving turn-around time, ultimately depends on the communication patterns between processes, workload on the communication links, and the maximum bandwidth of the links. The objective of this work is to understand the impact of communications on multi-processor jobs in order to develop scheduling strategies and job allocation algorithms for multi-cluster grids which can accommodate communication factors. In this paper we report on initial investigations of some co-allocation strategies. This evaluation is based on a simulator that has been implemented and validated experimentally across two HPC clusters.

[1]  Daniel C. Stanzione,et al.  Job communication characterization and its impact on meta-scheduling co-allocated jobs in a mini-grid , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Anca I. D. Bucur,et al.  The Performance of Processor Co-Allocation in Multicluster Systems , 2003, CCGRID.

[3]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[4]  Uwe Schwiegelshohn,et al.  On Advantages of Grid Computing for Parallel Job Scheduling , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[5]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[6]  Kuo-Chan Huang,et al.  Performance Evaluation of Load Sharing Policies on Computing Grid , 2005, PDPTA.

[7]  Ramin Yahyapour,et al.  Benefits of global grid computing for job scheduling , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[8]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[9]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[10]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[11]  Anca I. D. Bucur,et al.  A Measurement-Based Simulation Study of Processor Co-allocation in Multicluster Systems , 2003, JSSPP.

[12]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Programming and Applications , 1999 .