Laws of order

Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error process ad-hoc, time consuming and error-prone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying? In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and read-modify-write operations, that completely eliminate the use of expensive synchronization. We prove that one cannot avoid the use of either: i) read-after-write (RAW), where a write to shared variable A is followed by a read to a different shared variable B without a write to B in between, or ii) atomic write-after-read (AWAR), where an atomic operation reads and then writes to shared locations. Unfortunately, enforcing RAW or AWAR is expensive on all current mainstream processors. To enforce RAW, memory ordering--also called fence or barrier--instructions must be used. To enforce AWAR, atomic instructions such as compare-and-swap are required. However, these instructions are typically substantially slower than regular instructions. Although algorithm designers frequently struggle to avoid RAW and AWAR, their attempts are often futile. Our result characterizes the cases where avoiding RAW and AWAR is impossible. On the flip side, our result can be used to guide designers towards new algorithms where RAW and AWAR can be eliminated.

[1]  Gary L. Peterson,et al.  Myths About the Mutual Exclusion Problem , 1981, Inf. Process. Lett..

[2]  Leslie Lamport,et al.  A fast mutual exclusion algorithm , 1987, TOCS.

[3]  Mark Moir,et al.  A dynamic-sized nonblocking work stealing deque , 2006, Distributed Computing.

[4]  Shreekant S. Thakkar,et al.  Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[5]  Gadi Taubenfeld,et al.  Automatic discovery of mutual exclusion algorithms , 2003, PODC '03.

[6]  Hans-Juergen Boehm Reordering constraints for pthread-style locks , 2007, PPOPP.

[7]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[8]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[9]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[10]  William E. Weihl,et al.  Commutativity-based concurrency control for abstract data types , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[11]  David Chase,et al.  Dynamic circular work-stealing deque , 2005, SPAA '05.

[12]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[13]  Sam Toueg,et al.  Time and Space Lower Bounds for Nonblocking Implementations , 2000, SIAM J. Comput..

[14]  Leslie Lamport,et al.  The mutual exclusion problem: partII—statement and solutions , 1986, JACM.

[15]  Nir Shavit,et al.  Non-blocking steal-half work queues , 2002, PODC '02.

[16]  Maged M. Michael,et al.  Idempotent work stealing , 2009, PPoPP '09.

[17]  Nancy A. Lynch,et al.  Bounds on Shared Memory for Mutual Exclusion , 1993, Inf. Comput..

[18]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[19]  Radha Jagadeesan,et al.  A theory of memory models , 2007, PPOPP.

[20]  Faith Ellen,et al.  Time lower bounds for implementations of multi-writer snapshots , 2007, JACM.

[21]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[22]  Mark Moir,et al.  On the Uncontended Complexity of Consensus , 2003, DISC.

[23]  Hagit Attiya,et al.  Computing in Totally Anonymous Asynchronous Shared Memory Systems , 1998, Inf. Comput..

[24]  Gadi Taubenfeld,et al.  Concurrent counting , 1992, PODC '92.

[25]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[26]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[27]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[28]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[29]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[30]  Faith Ellen,et al.  Lower bounds for adaptive collect and related objects , 2004, PODC '04.

[31]  Nir Shavit,et al.  On the inherent weakness of conditional primitives , 2006, Distributed Computing.

[32]  Mark Moir,et al.  Obstruction-Free Step Complexity: Lock-Free DCAS as an Example , 2005, DISC.

[33]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.