Offline and Online Scheduling of Concurrent Bags-of-Tasks on Heterogeneous Platforms

Scheduling problems are already difficult on traditional parallel machines. They become extremely challenging on heterogeneous clusters, even when embarrassingly parallel applications are considered. In this paper we deal with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform. The applications are submitted online, which means that there is no a priori (static) knowledge of the workload distribution at the beginning of the execution. The objective is to minimize the maximum stretch, i.e. the maximum ratio between the actual time an application has spent in the system and the time this application would have spent if executed alone. On the theoretical side, we design an optimal algorithm for the offline version of the problem (when all release dates and application characteristics are known beforehand). We also introduce several heuristics for the general case of online applications. On the practical side, we have conducted extensive simulations and MPI experiments, showing that we are able to deal with very large problem instances in a few seconds. Also, the solution that we compute totally outperforms classical heuristics from the literature, thereby fully assessing the usefulness of our approach.

[1]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[2]  Frédéric Vivien,et al.  Minimizing the stretch when scheduling flows of biological requests , 2006, SPAA '06.

[3]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[4]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[5]  Larry Carter,et al.  Centralized versus Distributed Schedulers for Bag-of-Tasks Applications , 2008, IEEE Transactions on Parallel and Distributed Systems.

[6]  Arnold L. Rosenberg,et al.  Optimal sharing of bags of tasks in heterogeneous clusters , 2003, SPAA '03.

[7]  Yves Robert,et al.  Static tiling for heterogeneous computing platforms , 1999, Parallel Comput..

[8]  David Gamarnik,et al.  Asymptotically Optimal Algorithms for Job Shop Scheduling and Packet Routing , 1999, J. Algorithms.

[9]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[10]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[11]  Jacques Carlier,et al.  Handbook of Scheduling - Algorithms, Models, and Performance Analysis , 2004 .

[12]  Larry Carter,et al.  Centralized versus distributed schedulers for multiple bag-of-task applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  Arjen K. Lenstra,et al.  A World Wide Number Field Sieve Factoring Record: On to 512 Bits , 1996, ASIACRYPT.

[14]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.

[15]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[16]  Z Liu,et al.  Scheduling Theory and its Applications , 1997 .

[17]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  A. J. Clewett,et al.  Introduction to sequencing and scheduling , 1974 .

[20]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[21]  Yves Robert,et al.  Independent and divisible tasks scheduling on heterogeneous star-shaped platforms with limited memory , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[22]  Larry Carter,et al.  Dynamic autonomous scheduling on heterogeneous systems , 2003 .

[23]  Martin Skutella,et al.  The power of -points in preemptive single machine scheduling , 2002 .

[24]  Susanne Albers,et al.  On randomized online scheduling , 2002, STOC '02.

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[26]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[27]  Yves Robert,et al.  Steady-state scheduling on heterogeneous clusters , 2005, Int. J. Found. Comput. Sci..

[28]  Sanjeev Khanna,et al.  Approximation schemes for preemptive weighted flow time , 2002, STOC '02.

[29]  Manish Parashar,et al.  Understanding the Behavior and Performance of Non-blocking Communications in MPI , 2004, Euro-Par.

[30]  Susanne Albers,et al.  Online algorithms: a survey , 2003, Math. Program..

[31]  M. Radenkovic Usre Proxy Service in Mygrid. , 2003 .

[32]  Yves Robert,et al.  Matrix product on heterogeneous master-worker platforms , 2008, PPoPP.

[33]  Viktor K. Prasanna,et al.  Distributed adaptive task allocation in heterogeneous computing environments to maximize throughput , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[34]  William Gropp,et al.  MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.

[35]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[36]  Larry Carter,et al.  Bandwidth-centric allocation of independent tasks on heterogeneous platforms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[37]  Rajmohan Rajaraman,et al.  Online scheduling to minimize average stretch , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[38]  Leonel Sousa,et al.  Communication contention in task scheduling , 2005, IEEE Transactions on Parallel and Distributed Systems.

[39]  Rajmohan Rajaraman,et al.  Approximation Algorithms for Average Stretch Scheduling , 2004, J. Sched..

[40]  Henri Casanova,et al.  Parameter Sweeps on the Grid with APST , 2003 .

[41]  Wayne E. Smith Various optimizers for single‐stage production , 1956 .

[42]  Frédéric Vivien,et al.  Minimizing the stretch when scheduling flows of divisible requests , 2008, J. Sched..

[43]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[44]  Michael A. Bender,et al.  Flow and stretch metrics for scheduling continuous job streams , 1998, SODA '98.