Scheduling many-task workloads on supercomputers: Dealing with trailing tasks

In order for many-task applications to be attractive candidates for running on high-end supercomputers, they must be able to benefit from the additional compute, I/O, and communication performance provided by high-end HPC hardware relative to clusters, grids, or clouds. Typically this means that the application should use the HPC resource in such a way that it can reduce time to solution beyond what is possible otherwise. Furthermore, it is necessary to make efficient use of the computational resources, achieving high levels of utilization. Satisfying these twin goals is not trivial, because while the parallelism in many task computations can vary over time, on many large machines the allocation policy requires that worker CPUs be provisioned and also relinquished in large blocks rather than individually. This paper discusses the problem in detail, explaining and characterizing the trade-off between utilization and time to solution under the allocation policies of Blue Gene/P Intrepid at Argonne National Laboratory. We propose and test two strategies to improve this trade-off: scheduling tasks in order of longest to shortest (applicable only if task runtimes are predictable) and downsizing allocations when utilization drops below some threshold. We show that both strategies are effective under different conditions.

[1]  Radu Prodan,et al.  Bi-criteria Scheduling of Scientific Workflows for the Grid , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[2]  David B. Shmoys,et al.  Using dual approximation algorithms for scheduling problems: Theoretical and practical results , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[3]  José E. Moreira,et al.  Job Scheduling for the BlueGene/L System (Research Note) , 2002, Euro-Par.

[4]  Peter Brucker,et al.  Scheduling Algorithms , 1995 .

[5]  Michael Pinedo,et al.  Scheduling n Independent Jobs on m Uniform Machines with both Flowtime and Makespan Objectives: A Parametric Analysis , 1995, INFORMS J. Comput..

[6]  Pavan Balaji,et al.  Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P , 2009, 2009 International Conference on Parallel Processing.

[7]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Carolyn McCreary,et al.  A Comparison of Multiprocessor Scheduling Heuristics , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[10]  Arnold L. Rosenberg,et al.  A Tool for Prioritizing DAGMan Jobs and its Evaluation , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[11]  Ioan Raicu,et al.  Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing , 2009 .

[12]  Ewing Lusk,et al.  More scalability, less pain : A simple programming model and its implementation for extreme computing. , 2010 .

[13]  José E. Moreira,et al.  Job Scheduling for the BlueGene/L System , 2002, JSSPP.

[14]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[15]  Zhao Zhang,et al.  Towards Loo on , 2008 .

[16]  Irwin D. Kuntz,et al.  Development and validation of a modular, extensible docking program: DOCK 5 , 2006, J. Comput. Aided Mol. Des..

[17]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.

[18]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..