An Evaluation of Communication Factors on an Adaptive Control Strategy for Job Co-allocation in Multiple HPC Clusters

To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters, i.e., job co-allocation, offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processors on any single cluster. Effective co-allocation, ultimately, depends on the inter-cluster communication cost. We previously introduced a scalable co-allocation strategy – Maximum Bandwidth Adjacent cluster Set (MBAS) strategy. It made use of two thresholds to control job co-allocation – one dealing with inter-cluster links and one controlling job partitioning. We subsequently introduced the Adaptive Threshold Control System (ATCS), which used a fuzzy control approach to dynamically adjust these thresholds within MBAS. Results suggested that using ATCS during MBAS job co-allocation could achieve an overall performance improvement. However, these results only considered jobs that involved either master-slave or all-all communications among constituent processes. In this paper, we extend this analysis by also considering jobs that exhibit 2D-mesh communication patterns and evaluate ATCS further.

[1]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[2]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[3]  Kuo-Chan Huang,et al.  Performance Evaluation of Load Sharing Policies on Computing Grid , 2005, PDPTA.

[4]  Jinhui Qin,et al.  Job co-allocation strategies for multiple high performance computing clusters , 2009, Cluster Computing.

[5]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Programming and Applications , 1999 .

[6]  Jukka Vanhala,et al.  Learning and adaptive fuzzy control system for smart home , 2006 .

[7]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[8]  Jinhui Qin Job co-allocation strategies in multiple hpc clusters , 2009 .

[9]  Srinivasan Keshav,et al.  A control-theoretic approach to flow control , 1991, SIGCOMM '91.

[10]  Keqin Li,et al.  Job scheduling for grid computing on metacomputers , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  Alan Messer,et al.  Adaptive offloading inference for delivering applications in pervasive computing environments , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[12]  Anca I. D. Bucur,et al.  The Performance of Processor Co-Allocation in Multicluster Systems , 2003, CCGRID.

[13]  Andreas Pitsillides,et al.  Effective Control of Traffic Flow in ATM Networks Using Fuzzy Explicit Rate Marking. (FERM) , 1997, IEEE J. Sel. Areas Commun..

[14]  Daniel A. Reed,et al.  The Autopilot Performance-Directed Adaptive Control System , 1997 .

[15]  Daniel C. Stanzione,et al.  Characterization of Bandwidth-Aware Meta-Schedulers for Co-Allocating Jobs Across Multiple Clusters , 2005, The Journal of Supercomputing.

[16]  Li-Xin Wang,et al.  A Course In Fuzzy Systems and Control , 1996 .

[17]  Uwe Schwiegelshohn,et al.  On Advantages of Grid Computing for Parallel Job Scheduling , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[18]  Ramin Yahyapour,et al.  Benefits of global grid computing for job scheduling , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.