WSCOM: Online Task Scheduling with Data Transfers

This paper considers the online problem of task scheduling with communication. All information on tasks and communication are not available in advance except the DAG of task topology. This situation is typically encountered when scheduling DAG of tasks corresponding to Make files executions. To tackle this problem, we introduce a new variation of the work-stealing algorithm: WSCOM. These algorithms take advantage of the knowledge of the DAG topology to cluster communicating tasks together and reduce the total number of communications. Several variants are designed to overlap communication or optimize the graph decomposition. Performance is evaluated by simulation and our algorithms are compared with off-line list-scheduling algorithms and classical work-stealing from the literature. Simulations are executed on both random graphs and a new trace archive of Make file DAG. These experiments validate the different design choices taken. In particular we show that WSCOM is able to achieve performance close to off-line algorithms in most cases and is even able to achieve better performance in the event of congestion due to less data transfer. Moreover WSCOM can achieve the same high performances as the classical work-stealing with up to ten times less bandwidth.

[1]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[2]  Jason Maassen,et al.  Ibis: an efficient Java-based grid programming environment , 2002, JGI '02.

[3]  Brian Jepson,et al.  プログラミングMac OS X for Unix geeks , 2003 .

[4]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[5]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[6]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[7]  Thierry Gautier,et al.  KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.

[8]  Michael Voss,et al.  Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9]  Sriram Krishnamoorthy,et al.  Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[10]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[11]  Edward A. Lee,et al.  Dynamic-level scheduling for heterogeneous processor networks , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[12]  Jean-Marc Vincent,et al.  Random graph generation for scheduling simulations , 2010, SimuTools.

[13]  Jens Palsberg,et al.  Featherweight X10: a core calculus for async-finish parallelism , 2010, PPoPP '10.

[14]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[15]  Soonhoi Ha,et al.  A Static Scheduling Heuristic for Heterogeneous Processors , 1996, Euro-Par, Vol. II.

[16]  R. F. Freund,et al.  Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[17]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[18]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[19]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[20]  Han Hoogeveen,et al.  Three, four, five, six, or the complexity of scheduling with communication delays , 1994, Oper. Res. Lett..

[21]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[22]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[23]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[24]  Rizos Sakellariou,et al.  A hybrid heuristic for DAG scheduling on heterogeneous systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[25]  Jason Maassen,et al.  Satin: Simple and Efficient Java-based Grid Programming , 2005, Scalable Comput. Pract. Exp..

[26]  Vivek Sarkar,et al.  Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.

[27]  Bruno Gaujal,et al.  A mean field model of work stealing in large-scale systems , 2010, SIGMETRICS '10.