Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

Drawing ideas from previous authors, we present a new non-blocking concurrent queue algorithm and a new two-lock queue algorithm in which one enqueue and one dequeue can proceed concurrently. Both algorithms are simple, fast, and practical; we were surprised not to find them in the literature. Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional). The two-lock concurrent queue outperforms a single lock when several processes are competing simultaneously for access; it appears to be the algorithm of choice for busy queues on machines with non-universal atomic primitives (e.g., test_and_set). Since much of the motivation for non-blocking algorithms is rooted in their immunity to large, unpredictable delays in process execution, we report experimental results both for systems with dedicated processors and for systems with several processes multiprogrammed on each processor.

[1]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[2]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[3]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[4]  John Mellor-Crummey Concurrent Queues: Practical Fetch-and-Phi Algorithms. , 1987 .

[5]  Maurice Herlihy,et al.  Axioms for concurrent objects , 1987, POPL '87.

[6]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[7]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[8]  Janice M. Stone A simple and correct shared-queue algorithm using compare-and-swap , 1990, Proceedings SUPERCOMPUTING '90.

[9]  Harold Stuart Stone High-performance computer architecture (2nd ed.) , 1990 .

[10]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[11]  Theodore Johnson,et al.  A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[12]  Calton Pu,et al.  A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[13]  Dennis Shasha,et al.  Locking without blocking: making lock based concurrent data structure algorithms nonblocking , 1992, PODS '92.

[14]  Leonidas I. Kontothanassis,et al.  Using scheduler information to achieve optimal barrier synchronization performance , 1993, PPOPP '93.

[15]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[16]  Greg Barnes,et al.  A method for implementing lock-free shared-data structures , 1993, SPAA '93.

[17]  Michael L. Scott,et al.  Scalable spin locks for multiprogrammed systems , 1994, Proceedings of 8th International Parallel Processing Symposium.

[18]  John D. Valois Implementing Lock-Free Queues , 1994 .

[19]  Michael L. Scott,et al.  High performance synchronization algorithms for multiprogrammed multiprocessors , 1995, PPOPP '95.

[20]  Laxmi N. Bhuyan,et al.  High-performance computer architecture , 1995, Future Gener. Comput. Syst..

[21]  Maged M. Michael,et al.  Correction of a Memory Management Method for Lock-Free Data Structures , 1995 .