On the Benefits of Work Stealing in Shared-Memory Multiprocessors

Load balancing is one of the key techniques exploited to improve the performance of parallel programs. However, load balancing is a difficult task for the programmer. Work stealing is an architectural mechanism that provides improved performance by instantaneously balancing the load among processors in a multiprocessor system. In this work, we develop a queueing model of a shared-memory multiprocessor system in order to show that work stealing can ease the burden of the programmer by eliminating the need to manually load balance.

[1]  F. Warren Burton,et al.  Executing functional programs on a virtual tree of processors , 1981, FPCA '81.

[2]  Robert H. Halstead,et al.  Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.

[3]  Edward D. Lazowska,et al.  A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing , 1986, Perform. Evaluation.

[4]  P. Jones,et al.  Practical Experience of Run-Time Link Reconfiguration in a Multi-Transputer Machine , 1990, Concurr. Pract. Exp..

[5]  The Performance of Multiprogrammed Multiprocessor Scheduling Policies , 1990, SIGMETRICS.

[6]  Donald F. Towsley,et al.  Analysis of Fork-Join Program Response Times on Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[7]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[8]  Mark S. Squillante,et al.  Analysis of task migration in shared-memory multiprocessor scheduling , 1991, SIGMETRICS '91.

[9]  Burkhard Monien,et al.  A Fully Distributed Chess Program , 1991 .

[10]  H. T. Kung,et al.  Communication complexity for parallel divide-and-conquer , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[11]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[12]  Richard M. Karp,et al.  Randomized parallel algorithms for backtrack search and branch-and-bound computation , 1993, JACM.

[13]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[14]  Sivarama P. Dandamudi,et al.  Performance of Hierarchical Load Sharing in Heterogeneous Distributed Systems , 1996 .

[15]  Babak Hamidzadeh,et al.  Dynamic scheduling strategies for shared-memory multiprocessors , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[16]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[17]  M. Kulldorff A spatial scan statistic , 1997 .

[18]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[19]  Michael Mitzenmacher,et al.  Analyses of load stealing models based on differential equations , 1998, SPAA '98.

[20]  Mark S. Squillante,et al.  The impact of job arrival patterns on parallel scheduling , 1999, PERV.

[21]  Sivarama P. Dandamudi,et al.  Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems , 1999, IEEE Trans. Computers.

[22]  Guy E. Blelloch,et al.  The data locality of work stealing , 2000, SPAA.

[23]  Mark S. Squillante,et al.  Threshold-based priority policies for parallel-server systems with affinity scheduling , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[24]  Andrew W. Moore,et al.  A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters , 2003, NIPS.

[25]  Sivarama P. Dandamudi,et al.  The Impact of Program Structure on the Performance of Scheduling Policies in Multiprocessor Systems , 2013 .