Lock-free Concurrent Data Structures

© 2017 by John Wiley & Sons, Inc. All rights reserved. Concurrent data structures are the data sharing side of parallel programming. An implementation of a data structure is called lock-free, if it allows multiple processes/hreads to access the data structure concurrently and also guarantees that at least one operation among those finishes in a finite number of its own steps regardless of the state of the other operations. This chapter provides a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. It offers the programmer familiarity to the subject that allows using truly concurrent methods. The chapter discusses the fundamental synchronization primitives on which efficient lock-free data structures rely. It discusses the problem of managing dynamically allocated memory in lock-free concurrent data structures and general concurrent environments. The idiosyncratic architectural features of graphics processors that is important to consider when designing efficient lock-free concurrent data structures for this emerging area.

[1]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[2]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[3]  Hui Gao,et al.  Lock-free dynamic hash tables with open addressing , 2003, Distributed Computing.

[4]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[5]  John D. Valois Implementing Lock-Free Queues , 1994 .

[6]  Cheng Chen,et al.  A practical nonblocking queue algorithm using compare-and-swap , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[7]  Maged M. Michael CAS-Based Lock-Free Algorithm for Shared Deques , 2003, Euro-Par.

[8]  Philippas Tsigas,et al.  Dynamic Load Balancing Using Work-Stealing , 2011 .

[9]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[10]  Philippas Tsigas,et al.  Lock-free deques and doubly linked lists , 2008, J. Parallel Distributed Comput..

[11]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[12]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[13]  Henry Massalin,et al.  Synthesis: an efficient implementation of fundamental operating system services , 1992 .

[14]  Maurice Herlihy,et al.  The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures , 2002, DISC.

[15]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[16]  Philippas Tsigas,et al.  NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures , 2009, OPODIS.

[17]  Nir Shavit,et al.  Split-ordered lists: lock-free extensible hash tables , 2003, PODC '03.

[18]  Ralph Grishman,et al.  The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) , 1982, ISCA '82.

[19]  Amos Israeli,et al.  Efficient Wait-Free Implementation of a Concurrent Priority Queue , 1993, WDAG.

[20]  Maged M. Michael Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS , 2004, DISC.

[21]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[22]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[23]  Michael B. Greenwald,et al.  Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS , 2002, PODC '02.

[24]  Theodore Johnson,et al.  A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[25]  Maged M. Michael,et al.  Correction of a Memory Management Method for Lock-Free Data Structures , 1995 .

[26]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[27]  Philippas Tsigas,et al.  Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency , 2010, OPODIS.

[28]  Maged M. Michael Safe memory reclamation for dynamic lock-free objects using atomic reads and writes , 2002, PODC '02.

[29]  Mark Moir,et al.  Universal constructions for multi-object operations , 1995, PODC '95.

[30]  Marina Papatriantafilou,et al.  NBmalloc: Allocating Memory in a Lock-Free Manner , 2008, Algorithmica.

[31]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[32]  Philippas Tsigas,et al.  The Synchronization Power of Coalesced Memory Accesses , 2010, IEEE Transactions on Parallel and Distributed Systems.

[33]  Yi Zhang,et al.  A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.

[34]  Philippas Tsigas,et al.  On dynamic load balancing on graphics processors , 2008, GH '08.

[35]  Ted Herman,et al.  Space-optimal wait-free queues , 1997, PODC '97.

[36]  Yi Zhang,et al.  Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies , 2002, WOSP '02.

[37]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[38]  Gabriel Kliot,et al.  A lock-free, concurrent, and incremental stack scanning for garbage collectors , 2009, VEE '09.

[39]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[40]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[41]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[42]  Jyh-Jong Tsay,et al.  Lock-free concurrent tree structures for multiprocessor systems , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[43]  Hagit Attiya,et al.  Built-in Coloring for Highly-Concurrent Doubly-Linked Lists , 2012, Theory of Computing Systems.

[44]  Keir Fraser,et al.  Concurrent programming without locks , 2007, TOCS.

[45]  Stan Kelly-Bootle,et al.  68000, 68010, 68020 primer , 1985 .

[46]  Philippas Tsigas,et al.  Scalable and lock-free concurrent dictionaries , 2004, SAC '04.

[47]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[48]  Eric Ruppert,et al.  Lock-free linked lists and skip lists , 2004, PODC '04.

[49]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[50]  Mark Moir,et al.  Transparent Support for Wait-Free Transactions , 1997, WDAG.

[51]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[52]  Dennis Shasha,et al.  The many faces of consensus in distributed systems , 1992, Computer.

[53]  Nir Shavit,et al.  Scalable Producer-Consumer Pools Based on Elimination-Diffraction Trees , 2010, Euro-Par.

[54]  Larry Rudolph,et al.  Efficient synchronization of multiprocessors with shared memory , 1988, TOPL.

[55]  David R. Cheriton,et al.  Non-blocking synchronization and system design , 1999 .

[56]  Maurice Herlihy,et al.  Nonblocking memory management support for dynamic-sized data structures , 2005, TOCS.

[57]  Håkan Sundell,et al.  Efficient and Practical Non-Blocking Data Structures , 2004 .

[58]  Håkan Sundell Wait-Free Multi-Word Compare-and-Swap Using Greedy Helping and Grabbing , 2011, International Journal of Parallel Programming.

[59]  Mark Moir,et al.  A dynamic-sized nonblocking work stealing deque , 2006, Distributed Computing.

[60]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..

[61]  J. F. Groote,et al.  Lock-free parallel and concurrent garbage collection by mark&sweep , 2007, Sci. Comput. Program..

[62]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[63]  Calton Pu,et al.  A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[64]  Prasad Jayanti,et al.  Efficient and practical constructions of LL/SC variables , 2003, PODC '03.

[65]  Philippas Tsigas,et al.  NOBLE : A Non-Blocking Inter-Process Communication Library , 2002 .

[66]  Marina Papatriantafilou,et al.  Allocating Memory in a Lock-Free Manner , 2005, ESA.

[67]  Stephen Jones,et al.  XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[68]  Jeannette M. Wing,et al.  A Library of Concurrent Objects and Their Proofs of Correctness , 1990 .

[69]  Stephan Diestelhorst,et al.  Hardware acceleration for lock-free data structures and software-transactional memory , 2008 .

[70]  Nir Shavit,et al.  Even Better DCAS-Based Concurrent Deques , 2000, DISC.

[71]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[72]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2004, SPAA '04.

[73]  Paul F. Reynolds,et al.  Lock-Free Multiway Search Trees , 2010, 2010 39th International Conference on Parallel Processing.

[74]  Philippas Tsigas,et al.  Wait-Free Programming for General Purpose Computations on Graphics Processors , 2017, IEEE Transactions on Computers.

[75]  Marina Papatriantafilou,et al.  A lock-free algorithm for concurrent bags , 2011, SPAA '11.

[76]  Dimitrios S. Nikolopoulos,et al.  Scalable locality-conscious multithreaded memory allocation , 2006, ISMM '06.

[77]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[78]  Maurice Herlihy,et al.  Lock-free garbage collection for multiprocessors , 1991, SPAA '91.

[79]  Mark Moir Practical implementations of non-blocking synchronization primitives , 1997, PODC '97.

[80]  David R. Cheriton,et al.  The synergy between non-blocking synchronization and operating system structure , 1996, OSDI '96.

[81]  Greg Barnes Wait-Free Algorithms for Heaps , 1992 .

[82]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[83]  Mark Moir,et al.  Lock-free reference counting , 2002, PODC '01.

[84]  Bjarne Stroustrup,et al.  Lock-Free Dynamically Resizable Arrays , 2006, OPODIS.

[85]  Nir Shavit,et al.  DCAS-based concurrent deques , 2000, SPAA '00.

[86]  Marina Papatriantafilou,et al.  Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting , 2009, IEEE Transactions on Parallel and Distributed Systems.

[87]  Mark Moir,et al.  DCAS-based concurrent deques supporting bulk allocation , 2002 .

[88]  Yi Zhang,et al.  Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors , 2001, SIGMETRICS '01.

[89]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .