Integrating Lock-Free and Combining Techniques for a Practical and Scalable FIFO Queue

Concurrent FIFO queues can be generally classified into lock-free queues and combining-based queues. Lock-free queues require manual parameter tuning to control the contention level of parallel execution, while combining-based queues encounter a bottleneck of single-threaded sequential combiner executions at a high concurrency level. In this paper, we introduce a different approach using both lock-free techniques and combining techniques synergistically to design a practical and scalable concurrent queue algorithm. As a result, we have achieved high scalability without any parameter tuning: on an 80-thread average throughput in our experimental results, our queue algorithm outperforms the most widely used Michael and Scott queue by 14.3 times, the best-performing combining-based queue by 1.6 times, and the best performing x86-dependent lock-free queue by 1.7 percent. In addition, we designed our algorithm in such a way that the life cycle of a node is the same as that of its element. This has huge advantages over prior work: efficient implementation is possible without dedicated memory management schemes, which are supported only in some languages, may cause a performance bottleneck, or are patented. Moreover, the synchronized life cycle between an element and its node enables application developers to further optimize memory management.

[1]  Young Ik Eom,et al.  DANBI: Dynamic scheduling of irregular stream programs for many-core systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[2]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[3]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[4]  Mark Moir,et al.  Lock-free reference counting , 2002 .

[5]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[6]  Mathieu Desnoyers Proving the Correctness of Nonblocking Data Structures , 2013, ACM Queue.

[7]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[8]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[9]  Erez Petrank,et al.  A methodology for creating fast wait-free data structures , 2012, PPoPP '12.

[10]  Yehuda Afek,et al.  Fast concurrent queues for x86 processors , 2013, PPoPP '13.

[11]  Jonathan Walpole,et al.  User-Level Implementations of Read-Copy Update , 2012, IEEE Transactions on Parallel and Distributed Systems.

[12]  Nir Shavit,et al.  Scalable Flat-Combining Based Synchronous Queues , 2010, DISC.

[13]  Jonathan Walpole,et al.  Performance of memory reclamation for lockless synchronization , 2007, J. Parallel Distributed Comput..

[14]  Panagiota Fatourou,et al.  A highly-efficient wait-free universal construction , 2011, SPAA '11.

[15]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[16]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[17]  Paul E. McKenney,et al.  Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel , 2003, USENIX Annual Technical Conference, FREENIX Track.

[18]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[19]  Nir Shavit,et al.  An Optimistic Approach to Lock-Free FIFO Queues , 2004, DISC.

[20]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[21]  Michel Dagenais,et al.  Lockless multi-core high-throughput buffering scheme for kernel tracing , 2012, OPSR.

[22]  A. Agarwal,et al.  Adaptive backoff synchronization techniques , 1989, ISCA '89.

[23]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[24]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[25]  Nir Shavit,et al.  Atomic snapshots of shared memory , 1990, JACM.

[26]  Christoforos E. Kozyrakis,et al.  Dynamic Fine-Grain Scheduling of Pipeline Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[27]  Nir Shavit,et al.  An optimistic approach to lock-free FIFO queues , 2004, Distributed Computing.

[28]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[29]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[30]  Young Ik Eom,et al.  Scalable Cache-Optimized Concurrent FIFO Queue for Multicore Architectures , 2012, IEICE Trans. Inf. Syst..

[31]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[32]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[33]  John Mellor-Crummey Concurrent Queues: Practical Fetch-and-Phi Algorithms. , 1987 .

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2010, J. Parallel Distributed Comput..

[36]  Maurice Herlihy,et al.  The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures , 2002, DISC.

[37]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .