An Efficient Unbounded Lock-Free Queue for Multi-core Systems

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications.

[1]  Guang R. Gao,et al.  Toward high-throughput algorithms on many-core architectures , 2012, TACO.

[2]  Massimo Torquati,et al.  Porting Decision Tree Algorithms to Multicore using FastFlow , 2010, ECML/PKDD.

[3]  Nir Shavit,et al.  Work dealing , 2002, SPAA '02.

[4]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[5]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[6]  Leslie Lamport,et al.  Concurrent reading and writing , 1977, Commun. ACM.

[7]  Massimo Torquati,et al.  Single-Producer/Single-Consumer Queues on Shared Cache Multi-Core Systems , 2010, ArXiv.

[8]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[9]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[10]  Theodore Johnson,et al.  A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[11]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[12]  Massimo Torquati,et al.  Porting Decision Tree Building and Pruning Algorithms to Multicore using FastFlow , 2011 .

[13]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[14]  Torquati Massimo,et al.  Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed. , 2009 .

[15]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[16]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[17]  Patrick P. C. Lee,et al.  A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[18]  Yi Zhang,et al.  A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.

[19]  Jalal Kawash,et al.  Critical sections and producer/consumer queues in weak memory systems , 1997, Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN'97).

[20]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[21]  Nir Shavit,et al.  An optimistic approach to lock-free FIFO queues , 2004, Distributed Computing.