LL/SC and Atomic Copy: Constant Time, Space Efficient Implementations using only pointer-width CAS

When designing concurrent algorithms, Load-Link/Store-Conditional (LL/SC) is often the ideal primitive to have because unlike Compare and Swap (CAS), LL/SC is immune to the ABA problem. However, the full semantics of LL/SC are not supported by any modern machine, so there has been a significant amount of work on simulations of LL/SC using Compare and Swap (CAS), a synchronization primitive that enjoys widespread hardware support. All of the algorithms so far that are constant time either use unbounded sequence numbers (and thus base objects of unbounded size), or require $\Omega(MP)$ space for $M$ LL/SC object (where $P$ is the number of processes). We present a constant time implementation of $M$ LL/SC objects using $\Theta(M+kP^2)$ space, where $k$ is the maximum number of overlapping LL/SC operations per process (usually a constant), and requiring only pointer-sized CAS objects. Our implementation can also be used to implement $L$-word $LL/SC$ objects in $\Theta(L)$ time (for both $LL$ and $SC$) and $\Theta((M+kP^2)L)$ space. To achieve these bounds, we begin by implementing a new primitive called Single-Writer Copy which takes a pointer to a word sized memory location and atomically copies its contents into another object. The restriction is that only one process is allowed to write/copy into the destination object at a time. We believe this primitive will be very useful in designing other concurrent algorithms as well.

[1]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[2]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[3]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[4]  Maurice Herlihy,et al.  Bringing practical lock-free synchronization to 64-bit applications , 2004, PODC '04.

[5]  Mark Moir Practical implementations of non-blocking synchronization primitives , 1997, PODC '97.

[6]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[7]  Maged M. Michael Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS , 2004, DISC.

[8]  Guy E. Blelloch,et al.  Concurrent Reference Counting and Resource Management in Wait-free Constant Time , 2020, ArXiv.

[9]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[10]  Prasad Jayanti,et al.  Efficient Wait-Free Implementation of Multiword LL/SC Variables , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[11]  Prasad Jayanti,et al.  Efficiently Implementing a Large Number of LL/SC Objects , 2005, OPODIS.

[12]  Wojciech M. Golab,et al.  Making objects writable , 2014, PODC '14.

[13]  Mark Moir,et al.  Universal constructions for multi-object operations , 1995, PODC '95.

[14]  Prasad Jayanti,et al.  Efficient and practical constructions of LL/SC variables , 2003, PODC '03.

[15]  Maged M. Michael ABA Prevention Using Single-Word Instructions , 2004 .

[16]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[17]  Yehuda Afek,et al.  Wait-free made fast , 1995, STOC '95.

[18]  Faith Ellen,et al.  An Optimal Implementation of Fetch-and-Increment , 2013, DISC.

[19]  Philipp Woelfel,et al.  Upper Bounds for Boundless Tagging with Bounded Objects , 2016, DISC.

[20]  Mark Moir,et al.  Universal Constructions for Large Objects , 1995, IEEE Trans. Parallel Distributed Syst..

[21]  Prasad Jayanti,et al.  Efficiently Implementing LL/SC Objects Shared by an Unknown Number of Processes , 2005, IWDC.

[22]  Greg Barnes,et al.  A method for implementing lock-free shared-data structures , 1993, SPAA '93.