Design and implementation of an adaptive job allocation strategy for heterogeneous multi‐cluster computing systems

Cluster computing is an attractive approach to provide high‐performance computing for solving large‐scale applications. Owing to the advances in processor and networking technology, expanding clusters have resulted in the system heterogeneity; thus, it is crucial to dispatch jobs to heterogeneous computing resources for better resource utilization. In this paper, we propose a new job allocation system for heterogeneous multi‐cluster environments named the Adaptive Job Allocation Strategy (AJAS), in which a self‐scheduling scheme is applied in the scheduler to dispatch jobs to the most appropriate computing resources. Our strategy focuses on increasing resource utility by dispatching jobs to computing nodes with similar performance capacities. By doing so, execution times among all nodes can be equalized. The experimental results show that AJAS can improve the system performance. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[2]  Chao-Tung Yang,et al.  An Enhanced Parallel Loop Self-Scheduling Scheme for Cluster Environments , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[3]  Michael J. Schulte,et al.  Memory latency consideration for load sharing on heterogeneous network of workstations , 2006 .

[4]  Al Geist,et al.  PVM (Parallel Virtual Machine) , 2011, Encyclopedia of Parallel Computing.

[5]  Bu-Sung Lee,et al.  Workload management of cooperatively federated computing clusters , 2006, The Journal of Supercomputing.

[6]  Miron Livny,et al.  Scheduling Mixed Workloads in Multi-grids: The Grid Execution Hierarchy , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[7]  Sathish S. Vadhiyar,et al.  A strategy for scheduling tightly coupled parallel applications on clusters , 2009, Concurr. Comput. Pract. Exp..

[8]  Minyi Guo,et al.  A taxonomy of application scheduling tools for high performance cluster computing , 2006, Cluster Computing.

[9]  Xin Li,et al.  Improving Application Execution in Multicluster Grids , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.

[10]  Al Geist Cluster Computing: The Wave of the Future? , 1994, PARA.

[11]  William M. Jones Network‐aware selective job checkpoint and migration to enhance co‐allocation in multi‐cluster systems , 2009, Concurr. Comput. Pract. Exp..

[12]  Daniel S. Katz,et al.  An innovative application execution toolkit for multicluster grids , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[13]  Phillip Krueger,et al.  A comparison of preemptive and non-preemptive load distributing , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[14]  Yi-Min Wang,et al.  Local Cluster First Load Sharing Policy for Heterogeneous Clusters , 2007, J. Inf. Sci. Eng..

[15]  Jemal H. Abawajy,et al.  An efficient adaptive scheduling policy for high-performance computing , 2009, Future Gener. Comput. Syst..

[16]  John Ngubiri,et al.  A metric of fairness for parallel job schedulers , 2009, Concurr. Comput. Pract. Exp..

[17]  Anca I. D. Bucur,et al.  Scheduling Policies for Processor Coallocation in Multicluster Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[18]  Alfredo Goldman,et al.  A model for parallel job scheduling on dynamical computer Grids , 2003, Concurr. Pract. Exp..

[19]  Daniel C. Stanzione,et al.  Characterization of Bandwidth-Aware Meta-Schedulers for Co-Allocating Jobs Across Multiple Clusters , 2005, The Journal of Supercomputing.

[20]  Thomas Sterling,et al.  How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters 2nd Printing , 1999 .

[21]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[22]  Anthony T. Chronopoulos,et al.  A class of loop self-scheduling for heterogeneous clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[23]  Horacio González-Vélez,et al.  Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms , 2010, Concurr. Comput. Pract. Exp..

[24]  Derek Wright,et al.  Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor , 2007 .

[25]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Programming and Applications , 1999 .

[26]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[27]  Yi-Min Wang,et al.  Memory latency consideration for load sharing on heterogeneous network of workstations , 2006, J. Syst. Archit..

[28]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[29]  Zhiyi Huang,et al.  Load Balancing in a Cluster Computer , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[30]  Hui Li,et al.  Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[31]  Chao-Tung Yang,et al.  Resource brokering using a multi‐site resource allocation strategy for computational grids , 2011, Concurr. Comput. Pract. Exp..

[32]  Chao-Tung Yang,et al.  On construction of a well-balanced allocation strategy for heterogeneous multi-cluster computing environments , 2011, The Journal of Supercomputing.

[33]  Horacio González-Vélez,et al.  Adaptive structured parallelism for distributed heterogeneous architectures: a methodological approach with pipelines and farms , 2010 .

[34]  Chao-Tung Yang,et al.  On development of an efficient parallel loop self-scheduling for grid computing environments , 2007, Parallel Comput..

[35]  Fernando Gustavo Tinetti,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers. Barry Wilkinson, C. Michael Allen , 2000 .

[36]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[37]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[38]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[39]  Chao-Tung Yang,et al.  Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids , 2009, The Journal of Supercomputing.