A dynamic-sized nonblocking work stealing deque

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based on a collection of array-based double-ended queues (deques) with low cost synchronization among local and stealing processes. Unfortunately, the algorithm's synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments for which they are designed. This is a significant drawback since, apart from memory inefficiency, it means that the size of the deque must be tailored to accommodate the effects of the hard-to-predict level of multiprogramming, and the implementation must include an expensive and application-specific overflow mechanism.This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic-sized work stealing deques by detecting synchronization conflicts based on “pointer-crossing” rather than “gaps between indexes” as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work stealing deques, eliminating the need for application-specific overflow mechanisms.

[1]  David Detlefs,et al.  Garbage-first garbage collection , 2004, ISMM '04.

[2]  Nir Shavit,et al.  Non-blocking steal-half work queues , 2002, PODC '02.

[3]  Charles E. Leiserson,et al.  Programming Irregular Parallel Applications in Cilk , 1997, IRREGULAR.

[4]  Donald Ervin Knuth,et al.  The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information , 1978 .

[5]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[6]  Nir Shavit,et al.  DCAS-based concurrent deques , 2000, SPAA '00.

[7]  Mark Moir,et al.  \bf DCAS-Based Concurrent Deques , 2002, Theory of Computing Systems.

[8]  Mark Moir,et al.  DCAS-based concurrent deques supporting bulk allocation , 2002 .

[9]  David Chase,et al.  Dynamic circular work-stealing deque , 2005, SPAA '05.

[10]  Duncan A. Buell,et al.  Splash 2 , 1992, SPAA.

[11]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[12]  David R. Cheriton,et al.  The synergy between non-blocking synchronization and operating system structure , 1996, OSDI '96.

[13]  Guy E. Blelloch,et al.  The data locality of work stealing , 2000, SPAA.

[14]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[15]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[16]  Aske Plaat,et al.  Programming Parallel Applications In Cilk , 1997 .

[17]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[18]  Lawrence Snyder Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures , 1992, SPAA 1992.

[19]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[20]  Theodore Johnson,et al.  A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[21]  Robert D. Blumofe,et al.  The performance of work stealing in multiprogrammed environments (extended abstract) , 1998, SIGMETRICS '98/PERFORMANCE '98.

[22]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[23]  Robert D. Blumofe,et al.  Hood: A user-level threads library for multiprogrammed multiprocessors , 1998 .