论文信息 - Fast concurrent queues for x86 processors

Fast concurrent queues for x86 processors

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues. This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.

Yehuda Afek | Adam Morrison | Y. Afek | Adam Morrison

[1] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2] Maged M. Michael. Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[3] Nir Shavit,et al. Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[4] Nir Shavit,et al. The Baskets Queue , 2007, OPODIS.

[5] Erez Petrank,et al. Wait-free queues with multiple enqueuers and dequeuers , 2011, PPoPP '11.

[6] Panagiota Fatourou,et al. Revisiting the combining synchronization technique , 2012, PPoPP '12.

[7] Nir Shavit,et al. An Optimistic Approach to Lock-Free FIFO Queues , 2004, DISC.

[8] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.

[9] Guy E. Blelloch,et al. Scalable Room Synchronizations , 2003, Theory of Computing Systems.

[10] Maurice Herlihy,et al. The art of multiprocessor programming , 2020, PODC '06.

[11] Guy E. Blelloch,et al. Combinable memory-block transactions , 2008, SPAA '08.

[12] Robert Colvin,et al. Formal verification of an array-based nonblocking queue , 2005, 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05).

[13] Yi Zhang,et al. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.

[14] Eric Freudenthal,et al. Process coordination with fetch-and-increment , 1991 .

[15] Nir Shavit,et al. Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[16] Francesco Zappa Nardelli,et al. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors , 2010, Commun. ACM.

[17] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[18] Panagiota Fatourou,et al. A highly-efficient wait-free universal construction , 2011, SPAA '11.

[19] Larry Rudolph,et al. Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[20] Mark Moir,et al. Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[21] Niloufar Shafiei. Non-blocking Array-Based Algorithms for Stacks and Queues , 2009, ICDCN.