Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error process ad-hoc, time consuming and error-prone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying? In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and read-modify-write operations, that completely eliminate the use of expensive synchronization. We prove that one cannot avoid the use of either: i) read-after-write (RAW), where a write to shared variable A is followed by a read to a different shared variable B without a write to B in between, or ii) atomic write-after-read (AWAR), where an atomic operation reads and then writes to shared locations. Unfortunately, enforcing RAW or AWAR is expensive on all current mainstream processors. To enforce RAW, memory ordering--also called fence or barrier--instructions must be used. To enforce AWAR, atomic instructions such as compare-and-swap are required. However, these instructions are typically substantially slower than regular instructions. Although algorithm designers frequently struggle to avoid RAW and AWAR, their attempts are often futile. Our result characterizes the cases where avoiding RAW and AWAR is impossible. On the flip side, our result can be used to guide designers towards new algorithms where RAW and AWAR can be eliminated.

[1]  Maurice Herlihy,et al.  On the space complexity of randomized synchronization , 1993, PODC '93.

[2]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[3]  Jaejin Lee,et al.  Compilation techniques for explicitly parallel programs , 1999 .

[4]  David Chase,et al.  Dynamic circular work-stealing deque , 2005, SPAA '05.

[5]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[6]  Gadi Taubenfeld,et al.  Concurrent counting , 1992, PODC '92.

[7]  Nancy A. Lynch,et al.  Bounds on Shared Memory for Mutual Exclusion , 1993, Inf. Comput..

[8]  Paul E. McKenney,et al.  Memory Barriers: a Hardware View for Software Hackers , 2010 .

[9]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[10]  Mark Moir,et al.  On the Uncontended Complexity of Consensus , 2003, DISC.

[11]  Mark Moir,et al.  Obstruction-Free Step Complexity: Lock-Free DCAS as an Example , 2005, DISC.

[12]  Nir Shavit,et al.  Linear lower bounds on real-world implementations of concurrent objects , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  James R. Goodman,et al.  Cache Consistency and Sequential Consistency , 1991 .

[14]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.

[15]  Jalal Kawash,et al.  Java: Memory Consistency and Process Coordination (Extended Abstract) , 1998 .

[16]  Kenneth Kuhn,et al.  Principles of Operation , 1998 .

[17]  Nir Shavit,et al.  Non-blocking steal-half work queues , 2002, PODC '02.

[18]  William E. Weihl Commutativity-based concurrency control for abstract data types , 1988 .

[19]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[20]  Sebastian Burckhardt,et al.  CheckFence: checking consistency of concurrent data types on relaxed memory models , 2007, PLDI '07.

[21]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[22]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[23]  Eran Yahav,et al.  Automatic inference of memory fences , 2010, Formal Methods in Computer Aided Design.

[24]  Maged M. Michael,et al.  Idempotent work stealing , 2009, PPoPP '09.

[25]  Radha Jagadeesan,et al.  A theory of memory models , 2007, PPOPP.

[26]  Faith Ellen,et al.  Time lower bounds for implementations of multi-writer snapshots , 2007, JACM.

[27]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[28]  Jalal Kawash,et al.  Bounds for Mutual Exclusion with only Processor Consistency , 2000, DISC.

[29]  G. Winskel The formal semantics of programming languages , 1993 .

[30]  Faith Ellen,et al.  Lower bounds for adaptive collect and related objects , 2004, PODC '04.

[31]  Hagit Attiya,et al.  Computing in Totally Anonymous Asynchronous Shared Memory Systems , 1998, DISC.

[32]  Nir Shavit,et al.  On the inherent weakness of conditional primitives , 2006, Distributed Computing.

[33]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[34]  Jalal Kawash,et al.  Limitations and capabilities of weak memory consistency systems , 2000 .

[35]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[36]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[37]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[38]  Sam Toueg,et al.  Time and Space Lower Bounds for Nonblocking Implementations , 2000, SIAM J. Comput..

[39]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[40]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[41]  S. Lanka. Technical report 1989. , 1990 .

[42]  Hans-Juergen Boehm Reordering constraints for pthread-style locks , 2007, PPOPP.

[43]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[44]  Leslie Lamport,et al.  A fast mutual exclusion algorithm , 1987, TOCS.

[45]  Mark Moir,et al.  A dynamic-sized nonblocking work stealing deque , 2006, Distributed Computing.

[46]  Shreekant S. Thakkar,et al.  Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[47]  Gadi Taubenfeld,et al.  Automatic discovery of mutual exclusion algorithms , 2003, PODC '03.

[48]  Glynn Winskel,et al.  The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.

[49]  Leslie Lamport,et al.  The mutual exclusion problem: partII—statement and solutions , 1986, JACM.

[50]  Gary L. Peterson,et al.  Myths About the Mutual Exclusion Problem , 1981, Inf. Process. Lett..

[51]  Tom Ridge,et al.  A Rely-Guarantee Proof System for x86-TSO , 2010, VSTTE.