Relative performance of preemption-safe locking and non-blocking synchronization on multiprogrammed shared memory multiprocessors

Most multiprocessors are multiprogrammed to achieve acceptable response time. Unfortunately inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic update of shared data structures: (1) preemption safe locking and (2) non blocking (lock free) algorithms. Preemption safe locking requires kernel support. Non blocking algorithms generally require a universal atomic primitive, and are widely regarded as inefficient. We present a comparison of the two alternative strategies, focusing on four simple but important concurrent data structures-stacks, FIFO queues, priority queues and counters-in microbenchmarks and real applications on a 12 processor SGI Challenge multiprocessor. Our results indicate that data structure specific non blocking algorithms, which exist for stacks, FIFO queues and counters, can work extremely well: not only do they outperform preemption safe lock based algorithms on multiprogrammed machines, they also out perform ordinary locks on dedicated machines. At the same time, since general purpose nonblocking techniques do not yet appear to be practical, preemption safe locks remain the preferred alternative for complex data structures: they outperform conventional locks by significant margins on multiprogrammed systems.

[1]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[2]  John D. Valois Lock-free linked lists using compare-and-swap , 1995, PODC '95.

[3]  Dennis Shasha,et al.  Locking without blocking: making lock based concurrent data structure algorithms nonblocking , 1992, PODS '92.

[4]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[5]  Maged M. Michael,et al.  Implementation of atomic primitives on distributed shared memory multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[6]  Maurice Herlihy,et al.  Axioms for concurrent objects , 1987, POPL '87.

[7]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[8]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[9]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[10]  Anthony LaMarca,et al.  A performance evaluation of lock-free synchronization protocols , 1994, PODC '94.

[11]  John D. Valois Implementing Lock-Free Queues , 1994 .

[12]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[13]  Maged M. Michael,et al.  Concurrent Update on Multiprogrammed Shared Memory Multiprocessors , 1996 .

[14]  Greg Barnes,et al.  A method for implementing lock-free shared-data structures , 1993, SPAA '93.

[15]  Calton Pu,et al.  A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[16]  Theodore Johnson,et al.  A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[17]  Richard J. Anderson,et al.  Wait-free parallel algorithms for the union-find problem , 1991, STOC '91.

[18]  Michael L. Scott,et al.  Scheduler-conscious synchronization , 1997, TOCS.

[19]  Edward D. Lazowska,et al.  The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[20]  Edward W. Felten,et al.  Performance issues in non-blocking synchronization on shared-memory multiprocessors , 1992, PODC '92.

[21]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[22]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.