Job co-allocation strategies for multiple high performance computing clusters

To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters becomes an attractive possibility. This allocation process entails dividing the processes of a job among several clusters, which we refer to as co-allocation. Co-allocation offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processes on any single cluster. In order to realize these possibilities, effective co-allocation, ultimately, depends on the inter-cluster communication cost. In this paper, we introduce a scalable co-allocation strategy called the Maximum Bandwidth Adjacent cluster Set (MBAS) strategy. The strategy makes use of two thresholds to control allocation: one to control the limit on bandwidth on usable inter-cluster communication links and another to control how jobs are split. A simulator that can simulate the dynamic behavior of jobs running across multiple clusters was developed and used to examine the performance of the MBAS co-allocation strategy. Our results indicate that by adjusting the thresholds for link level control and chunk size control in splitting jobs, the MBAS co-allocation strategy can significantly improve both user satisfaction and system utilization.

[1]  Jinhui Qin,et al.  A Study on Job Co-Allocation in Multiple HPC Clusters , 2006, HPCS.

[2]  Uwe Schwiegelshohn,et al.  On Advantages of Grid Computing for Parallel Job Scheduling , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[3]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[4]  Keqin Li,et al.  Job scheduling for grid computing on metacomputers , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[6]  Vipin Kumar,et al.  Algorithms for Constraint-Satisfaction Problems: A Survey , 1992, AI Mag..

[7]  Joachim Geiler,et al.  Workflow-based Grid applications , 2006, Future Gener. Comput. Syst..

[8]  Kuo-Chan Huang,et al.  Performance Evaluation of Load Sharing Policies on Computing Grid , 2005, PDPTA.

[9]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[10]  Dror G. Feitelson,et al.  Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling , 2003, JSSPP.

[11]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[12]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[13]  Ata Elahi Network Communications Technology , 2000 .

[14]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[15]  Roman Barták,et al.  Constraint Satisfaction for Planning and Scheduling , 2005 .

[16]  Dror G. Feitelson,et al.  Parallel Job Scheduling under Dynamic Workloads , 2003, JSSPP.

[17]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Programming and Applications , 1999 .

[18]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[19]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[20]  Anca I. D. Bucur,et al.  A Measurement-Based Simulation Study of Processor Co-allocation in Multicluster Systems , 2003, JSSPP.

[21]  Ramin Yahyapour,et al.  Benefits of global grid computing for job scheduling , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[22]  Jinhui Qin,et al.  A Study on Job Co-Allocation in Multiple HPC Clusters , 2005, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[23]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[24]  Daniel C. Stanzione,et al.  Characterization of Bandwidth-Aware Meta-Schedulers for Co-Allocating Jobs Across Multiple Clusters , 2005, The Journal of Supercomputing.

[25]  Ioannis Vlahavas,et al.  Intelligent techniques for planning , 2004 .

[26]  Anca I. D. Bucur,et al.  The Performance of Processor Co-Allocation in Multicluster Systems , 2003, CCGRID.

[27]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.