Expander: Lock-Free Cache for a Concurrent Data Structure

Parallel programming models and paradigms are increasingly becoming more expressive with a steady increase in the number of cores that can be placed on a single chip. Concurrent data structures for shared memory parallel pro- grams are now being used in operating systems, middle-ware, and device drivers. In such a shared memory model, processes communicate and synchronize by applying primitive operations on memory words. To implement concurrent data structures that are linearizable and possibly lock-free or wait-free, it is often necessary to add additional information to memory words in a data structure. This additional information can range from a single bit to multiple bits that typically represent thread ids, request ids, timestamps, and other application dependent fields. Since most processors can perform compare-And-Set (CAS) or load-link/store-conditional (LL/SC) operations on only 64 bits at a time, current approaches either use some bits in a memory word to pack additional information (packing), or use the bits to store a pointer to an object that contains additional information (redirection), and the original data item. The former approach restricts the number of bits for each additional field and this reduces the range of the field, and the latter approach is wasteful in terms of space. We propose a novel and universal method called a memory word expander in this paper. It caches information for a set of memory locations that need to be augmented with additional information. It supports traditional atomic get, set, and CAS operations, and tries to maintain state for a minimum number of entries. We experimentally demonstrate that it is possible to reduce the runtime memory footprint by 20-35% for algorithms that use redirection. For algorithms that use packing, the use of the EXPANDER can make them feasible. The performance overhead is within 2-13% for 32 threads. When we compare the performance of the EXPANDER based non-blocking algorithms with the version that uses locks, we have a performance gain of at least 10-100X.

[1]  John D. Valois Implementing Lock-Free Queues , 1994 .

[2]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[3]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[4]  Neeraj Mittal,et al.  Fast concurrent lock-free binary search trees , 2014, PPoPP.

[5]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[6]  Keir Fraser,et al.  A Practical Multi-word Compare-and-Swap Operation , 2002, DISC.

[7]  Smruti R. Sarangi,et al.  Lock-Free and Wait-Free Slot Scheduling Algorithms , 2016, IEEE Trans. Parallel Distributed Syst..

[8]  Smruti R. Sarangi,et al.  A hardware implementation of the MCAS synchronization primitive , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[9]  Hagit Attiya,et al.  Highly-Concurrent Multi-word Synchronization , 2008, ICDCN.

[10]  Gadi Taubenfeld,et al.  Disentangling Multi-object Operations , 1997 .

[11]  Erez Petrank,et al.  Wait-free linked-lists , 2012, PPoPP '12.

[12]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[13]  Amos Israeli,et al.  Efficient Wait-Free Implementation of a Concurrent Priority Queue , 1993, WDAG.

[14]  Konstantinos Sagonas,et al.  Efficient memory management for concurrent programs that use message passing , 2006, Sci. Comput. Program..

[15]  Maged M. Michael ABA Prevention Using Single-Word Instructions , 2004 .

[16]  Maurice Herlihy,et al.  Nonblocking memory management support for dynamic-sized data structures , 2005, TOCS.

[17]  Håkan Sundell Wait-Free Multi-Word Compare-and-Swap Using Greedy Helping and Grabbing , 2011, International Journal of Parallel Programming.

[18]  Maged M. Michael Safe memory reclamation for dynamic lock-free objects using atomic reads and writes , 2002, PODC '02.

[19]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[20]  Greg Barnes,et al.  A method for implementing lock-free shared-data structures , 1993, SPAA '93.

[21]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[22]  Faith Ellen,et al.  Pragmatic primitives for non-blocking data structures , 2013, PODC '13.

[23]  James H. Anderson,et al.  Implementing wait-free objects on priority-based systems , 1997, PODC '97.

[24]  Erez Petrank,et al.  A methodology for creating fast wait-free data structures , 2012, PPoPP '12.

[25]  Erez Petrank,et al.  Wait-free queues with multiple enqueuers and dequeuers , 2011, PPoPP '11.

[26]  Pierre LaBorde,et al.  A Wait-Free Multi-Word Compare-and-Swap Operation , 2014, International Journal of Parallel Programming.

[27]  John D. Valois Lock-free linked lists using compare-and-swap , 1995, PODC '95.