Fast concurrent queues for x86 processors

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues. This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.

[1]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[3]  Nir Shavit,et al.  Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[4]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[5]  Erez Petrank,et al.  Wait-free queues with multiple enqueuers and dequeuers , 2011, PPoPP '11.

[6]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[7]  Nir Shavit,et al.  An Optimistic Approach to Lock-Free FIFO Queues , 2004, DISC.

[8]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[9]  Guy E. Blelloch,et al.  Scalable Room Synchronizations , 2003, Theory of Computing Systems.

[10]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[11]  Guy E. Blelloch,et al.  Combinable memory-block transactions , 2008, SPAA '08.

[12]  Robert Colvin,et al.  Formal verification of an array-based nonblocking queue , 2005, 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05).

[13]  Yi Zhang,et al.  A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.

[14]  Eric Freudenthal,et al.  Process coordination with fetch-and-increment , 1991 .

[15]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[16]  Francesco Zappa Nardelli,et al.  x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors , 2010, Commun. ACM.

[17]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[18]  Panagiota Fatourou,et al.  A highly-efficient wait-free universal construction , 2011, SPAA '11.

[19]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[20]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[21]  Niloufar Shafiei Non-blocking Array-Based Algorithms for Stacks and Queues , 2009, ICDCN.