Analysis of Work Stealing with latency

Abstract We study the impact of communication latency on the classical Work Stealing load balancing algorithm. Our paper extends the reference model in which we introduce a latency parameter. By using a theoretical analysis and simulation, we study the overall impact of this latency on the Makespan (maximum completion time). We derive a new expression of the expected running time of a bag of independent tasks scheduled by Work Stealing. This expression enables us to predict under which conditions a given run will yield acceptable performance. For instance, we can easily calibrate the maximal number of processors to use for a given work/platform combination. All our results are validated through simulation on a wide range of parameters.

[1]  Leslie Ann Goldberg,et al.  The Natural Work-Stealing Algorithm is Stable , 2001, SIAM J. Comput..

[2]  Sriram Krishnamoorthy,et al.  Work stealing for GPU‐accelerated parallel programs in a global address space framework , 2016, Concurr. Comput. Pract. Exp..

[3]  Michael Voss,et al.  Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Jixiang Yang,et al.  Scheduling Parallel Computations by Work Stealing: A Survey , 2018, International Journal of Parallel Programming.

[5]  Thierry Gautier,et al.  KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.

[6]  Sriram Krishnamoorthy,et al.  Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[7]  Michael A. Bender,et al.  Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk , 2002, SPAA '00.

[8]  Denis Trystram,et al.  Decentralized list scheduling , 2011, Ann. Oper. Res..

[9]  Yi Guo,et al.  SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.

[10]  Charles E. Leiserson,et al.  Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Katherine Yelick,et al.  Hierarchical Work Stealing on Manycore Clusters , 2011 .

[12]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[13]  Gary B. Wills,et al.  Simulation and Mathematical Analysis of Multi-core Cluster Architecture , 2015, 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim).

[14]  K. H. Randall,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[15]  Umut A. Acar,et al.  Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing , 2016, SPAA.

[16]  Michael Mitzenmacher,et al.  Analyses of load stealing models based on differential equations , 1998, SPAA '98.

[17]  Mitsuhisa Sato,et al.  Victim Selection and Distributed Work Stealing Performance: A Case Study , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[18]  Xin Cheng,et al.  Asynchronous Work Stealing on Distributed Memory Systems , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[19]  Guy E. Blelloch,et al.  The Data Locality of Work Stealing , 2002, SPAA '00.

[20]  Reinhard Lüling,et al.  A dynamic distributed load balancing algorithm with provable good performance , 1993, SPAA '93.

[21]  Bruno Gaujal,et al.  A mean field model of work stealing in large-scale systems , 2010, SIGMETRICS '10.

[22]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[23]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.