Practical divisible load scheduling on grid platforms with APST-DV

Divisible load applications consist of a load, that is input data and associated computation, that can be divided arbitrarily into independent pieces. Such applications arise in many fields and are ideally suited to a master-worker execution, but they pose several scheduling challenges. While the "divisible load scheduling" (DLS) problem has been studied extensively from a theoretical standpoint, in this paper we focus on practical issues: we extend a production grid application execution environment, APST, to support divisible load applications; we implement previously proposed DLS algorithms as part of APST; we evaluate and compare these algorithms on a real-world two-cluster platform; we show in a case study how a user can easily and effectively run a real-world divisible load application; and we uncover several issues that are critical for using DLS theory in practice. To the best of our knowledge the software resulting from this work, APST-DV, is the first usable and generic tool for deploying divisible load applications on distributed computing platforms.

[1]  Jeanette P. Schmidt,et al.  Load-sharing in heterogeneous systems via weighted factoring , 1996, SPAA '96.

[2]  Henri Casanova,et al.  Parameter Sweeps on the Grid with APST , 2003 .

[3]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[4]  Francine Berman,et al.  Distributing MCell Simulations on the Grid , 2001, Int. J. High Perform. Comput. Appl..

[5]  Jason Lee,et al.  Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Debasish Ghose,et al.  Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems , 2004, Cluster Computing.

[7]  Jack Dongarra,et al.  Application-Level Tools , 2004, The Grid 2, 2nd Edition.

[8]  Henri Casanova,et al.  RUMR: robust scheduling for divisible workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[9]  Rafael Asorey-Cacheda,et al.  DVD transcoding via Linux metacomputing , 2003 .

[10]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[11]  M J Ackerman,et al.  The Visible Human Project , 1998, Proc. IEEE.

[12]  David B. Skillicorn,et al.  Strategies for parallel data mining , 1999, IEEE Concurr..

[13]  Debasish Ghose,et al.  Multi-installment load distribution in tree networks with delays , 1995 .

[14]  Alok N. Choudhary,et al.  High performance multidimensional analysis of large datasets , 1998, DOLAP '98.

[15]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[16]  Alan Watt,et al.  3D Computer Graphics , 1993 .

[17]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[18]  João Gama,et al.  Exploiting Parallelism in Decision Tree Induction , 2007 .

[19]  Han-Wei Shen,et al.  An interleaved parallel volume renderer with PC-clusters , 2002, EGPGV.

[20]  Henri Casanova,et al.  UMR: a multi-round algorithm for scheduling divisible workloads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  Henri Casanova,et al.  Scheduling divisible loads on star and tree networks: results and open problems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[22]  Debasish Ghose,et al.  Adaptive divisible load scheduling strategies for workstation clusters with unknown network resources , 2005, IEEE Transactions on Parallel and Distributed Systems.

[23]  Torben Hagerup,et al.  Allocating Independent Tasks to Parallel Processors: An Experimental Study , 1996, J. Parallel Distributed Comput..

[24]  H. Siegel,et al.  Parallel Processing of Spaceborne Imaging Radar Data , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[25]  Masato Oguchi,et al.  Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[26]  Thomas G. Robertazzi,et al.  Ten Reasons to Use Divisible Load Theory , 2003, Computer.

[27]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[28]  Michael John Ackerman,et al.  The Visible Human Project. , 1991 .