Centralized versus Distributed Schedulers for Bag-of-Tasks Applications

Multiple applications that execute concurrently on heterogeneous platforms compete for CPU and network resources. In this paper, we consider the problem of scheduling applications to ensure fair and efficient execution on a distributed network of processors. We limit our study to the case where communication is restricted to a tree embedded in the network, and the applications consist of a large number of independent tasks (Bags of Tasks) that originate at the tree's root. The tasks of a given application all have the same computation and communication requirements, but these requirements can vary for different applications. The goal of scheduling is to maximize the throughput of each application while ensuring a fair sharing of resources between applications. We can find the optimal asymptotic rates by solving a linear programming problem that expresses all necessary problem constraints, and we show how to construct a periodic schedule from any linear program solution. For single-level trees, the solution is characterized by processing tasks with larger communication-to-computation ratios at children with larger bandwidths. For multilevel trees, this approach requires global knowledge of all application and platform parameters. For large-scale platforms, such global coordination by a centralized scheduler may be unrealistic. Thus, we also investigate decentralized schedulers that use only local information at each participating resource. We assess their performance via simulation and compare to an optimal centralized solution obtained via linear programming. The best of our decentralized heuristics achieves the same performance on about 2/3 of our test cases but is far worse in a few cases. Although our results are based on simple assumptions and do not explore all parameters (such as the maximum number of tasks that can be held on a node), they provide insight into the important question of fairly and optimally scheduling heterogeneous applications on heterogeneous grids.

[1]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[2]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[3]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[4]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[5]  Larry Carter,et al.  Dynamic autonomous scheduling on heterogeneous systems , 2003 .

[6]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[7]  Yves Robert,et al.  Independent and divisible tasks scheduling on heterogeneous star-shaped platforms with limited memory , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[8]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[9]  Z Liu,et al.  Scheduling Theory and its Applications , 1997 .

[10]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[11]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Francisco Vilar Brasileiro,et al.  Running Bag-of-Tasks applications on computational grids: the MyGrid approach , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[13]  Larry Carter,et al.  Scheduling multiple bags of tasks on heterogeneous master- worker platforms: centralized versus distributed solutions , 2005 .

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Henri Casanova,et al.  Parameter Sweeps on the Grid with APST , 2003 .

[16]  Catherine Rosenberg,et al.  A game theoretic framework for bandwidth allocation and pricing in broadband networks , 2000, TNET.

[17]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[18]  Olivier Beaumont,et al.  Pipelining Broadcasts on Heterogeneous Platforms under the One-Port Model , 2004 .

[19]  Laurent Massoulié,et al.  Impact of fairness on Internet performance , 2001, SIGMETRICS '01.

[20]  M. Radenkovic Usre Proxy Service in Mygrid. , 2003 .

[21]  Larry Carter,et al.  Bandwidth-centric allocation of independent tasks on heterogeneous platforms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[22]  Yves Robert,et al.  Pipelining broadcasts on heterogeneous platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[23]  Yves Robert,et al.  Steady-state scheduling on heterogeneous clusters , 2005, Int. J. Found. Comput. Sci..

[24]  Viktor K. Prasanna,et al.  Distributed adaptive task allocation in heterogeneous computing environments to maximize throughput , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[25]  Larry Carter,et al.  Autonomous protocols for bandwidth-centric scheduling of independent-task applications , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[26]  David Gamarnik,et al.  Asymptotically Optimal Algorithms for Job Shop Scheduling and Packet Routing , 1999, J. Algorithms.

[27]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[28]  Arnaud Legrand,et al.  Non-Cooperative Scheduling of Multiple Bag-of-Task Applications , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[29]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[30]  Arjen K. Lenstra,et al.  A World Wide Number Field Sieve Factoring Record: On to 512 Bits , 1996, ASIACRYPT.

[31]  Laurent Massoulié,et al.  Bandwidth sharing: objectives and algorithms , 2002, TNET.

[32]  Nazareno Andrade,et al.  A Reciprocation-Based Economy for Multiple Services in Peer-to-Peer Grids , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[33]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.