A Resilient Hierarchical Distributed Loop Self-Scheduling Scheme for Cloud Systems

In heterogeneous distributed cloud systems, load balance, communication and synchronization overhead must be taken considered. A hierarchical distributed loop self-scheduling scheme is effective and efficient for scientific loop applications. In this paper, we propose a resilient hierarchical distributed loop self-scheduling algorithm suitable for heterogeneous cloud systems. This algorithm is intended to enable the algorithm to continue to work in the event that some virtual machines (VMs) are too slow or cease to respond. We tested our algorithm in a heterogeneous cloud system. The results show that our algorithm can achieve normal operation and good performance.

[1]  Alfonso Niño,et al.  An Adaptive Approach to Task Scheduling Optimization in Dynamic Grid Environments , 2009, GCA.

[2]  Anthony T. Chronopoulos,et al.  Load balancing in distributed systems: an approach using cooperative games , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  Lizhe Wang,et al.  Scientific Cloud Computing: Early Definition and Experience , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[4]  Anthony T. Chronopoulos,et al.  On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy , 1989, Parallel Comput..

[5]  Anthony T. Chronopoulos,et al.  Price-based user-optimal job allocation scheme for grid systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  William Gropp,et al.  Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..

[7]  Anthony T. Chronopoulos,et al.  Distributed Loop Scheduling Schemes for Cloud Systems , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[8]  Anthony T. Chronopoulos,et al.  A Hierarchical Distributed Loop Self-Scheduling Scheme for Cloud Systems , 2013, 2013 IEEE 12th International Symposium on Network Computing and Applications.

[9]  Jie Liu,et al.  Scheduling Functionally Heterogeneous Systems with Utilization Balancing , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[10]  Anthony T. Chronopoulos,et al.  Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11]  Anthony T. Chronopoulos,et al.  Implementation of Distributed Loop Scheduling Schemes on the TeraGrid , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Anthony T. Chronopoulos,et al.  Dynamic Multi-User Load Balancing in Distributed Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[13]  Anthony T. Chronopoulos,et al.  Cooperative load balancing for a network of heterogeneous computers , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Anthony T. Chronopoulos,et al.  Enhancing self-scheduling algorithms via synchronization and weighting , 2008, J. Parallel Distributed Comput..

[15]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[16]  Alexander V. Veidenbaum,et al.  A fault tolerant self-scheduling scheme for parallel loops on shared memory systems , 2012, 2012 19th International Conference on High Performance Computing.

[17]  Anthony T. Chronopoulos,et al.  Scalable loop self-scheduling schemes for heterogeneous clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[18]  Theodore Andronikos,et al.  Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments , 2006, 2006 IEEE International Conference on Cluster Computing.

[19]  Anthony T. Chronopoulos,et al.  Job allocation schemes in computational grids based on cost optimization , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[20]  Yi-Min Wang,et al.  Hierarchical loop scheduling for clustered NUMA machines , 2000, J. Syst. Softw..

[21]  Anthony T. Chronopoulos,et al.  Distributed loop‐scheduling schemes for heterogeneous computer systems , 2006, Concurr. Comput. Pract. Exp..

[22]  Vincenzo Piuri,et al.  Fault Tolerance Management in Cloud Computing: A System-Level Perspective , 2013, IEEE Systems Journal.

[23]  Anthony T. Chronopoulos s-Step Iterative Methods for (Non) Symmetric (In) Definite Linear Systems , 1989, PPSC.

[24]  George K. Papakonstantinou,et al.  A Flexible General-Purpose Parallelizing Architecture for Nested Loops in Reconfigurable Platforms , 2007, PATMOS.

[25]  Chao-Tung Yang,et al.  Designing parallel loop self-scheduling schemes using the hybrid MPI and OpenMP programming model for multi-core grid systems , 2010, The Journal of Supercomputing.

[26]  Anthony T. Chronopoulos,et al.  A class of loop self-scheduling for heterogeneous clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[27]  Anthony T. Chronopoulos,et al.  A game-theoretic model and algorithm for load balancing in distributed systems , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[28]  Anthony T. Chronopoulos A class of parallel iterative methods implemented on multiprocessors , 1987 .