ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms

Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.

[1]  Kevin Skadron,et al.  A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[2]  Ümit V. Çatalyürek,et al.  Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5]  Laxmikant V. Kalé,et al.  A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems , 2012, 2012 41st International Conference on Parallel Processing.

[6]  David H. Bailey,et al.  The NAS Parallel Benchmarks 2.0 , 2015 .

[7]  Sid Ahmed Ali Touati,et al.  The Speedup‐Test: a statistical methodology for programme speedup analysis and computation , 2013, Concurr. Comput. Pract. Exp..

[8]  Philippe Olivier Alexandre Navaux,et al.  A topology-aware load balancing algorithm for clustered hierarchical multi-core machines , 2014, Future Gener. Comput. Syst..

[9]  D. Kaeli,et al.  Parallel Architectures and Compilation Techniques PACT 2001 Workin-Progress Session , 2001 .

[10]  Philippe Olivier Alexandre Navaux,et al.  An Efficient Algorithm for Communication-Based Task Mapping , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[11]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[12]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[13]  Thomas L. Casavant,et al.  A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems , 1988, IEEE Trans. Software Eng..

[14]  Jacques Carlier,et al.  Handbook of Scheduling - Algorithms, Models, and Performance Analysis , 2004 .

[15]  Laxmikant V. Kalé,et al.  Dynamic topology aware load balancing algorithms for molecular dynamics applications , 2009, ICS.

[16]  Dick Epema,et al.  Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing , 2012, HPDC 2012.

[17]  R. Vanderwijngaart,et al.  NAS Parallel Benchmarks, Multi-Zone Versions , 2003 .