Multicore Locks: The Case Is Not Closed Yet

NUMA multicore machines are pervasive and many multithreaded applications are suffering from lock contention. To mitigate this issue, application and library developers can choose from the plethora of optimized mutex lock algorithms that have been designed over the past 25 years. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications. In this paper, we attempt to fill this gap. We perform a performance study of 27 state-of-the-art mutex lock algorithms on 35 applications. Our study shows that the case is not yet closed regarding locking on multicore machines. Indeed, our conclusions include the following findings: (i) at its optimized contention level, no single lock is the best for more than 52% of the studied workloads; (ii) every lock is harmful for several applications, even if the application parallelism is properly tuned; (iii) for several applications, the best lock changes when varying the number of threads. These findings call for further research on optimized lock algorithms and dynamic adaptation of contention management.

[1]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[2]  Tudor David,et al.  Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures , 2015, ASPLOS.

[3]  Julia L. Lawall,et al.  Continuously measuring critical section pressure with the free-lunch profiler , 2014, OOPSLA.

[4]  Nathan R. Tallent,et al.  Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.

[5]  Jean-Pierre Lozi Towards more scalable mutual exclusion for multicore architectures. (Vers des mécanismes d'exclusion mutuelle plus efficaces pour les architectures multi-cœur) , 2014 .

[6]  Gadi Taubenfeld Shared Memory Synchronization , 2008, Bull. EATCS.

[7]  Nir Shavit,et al.  Flat-combining NUMA locks , 2011, SPAA '11.

[8]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[9]  John M. Mellor-Crummey,et al.  High performance locks for multi-level NUMA systems , 2015, PPoPP.

[10]  Julia L. Lawall,et al.  Fast and Portable Locking for Multicore Architectures , 2016, ACM Trans. Comput. Syst..

[11]  Vincent Gramoli,et al.  More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.

[12]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[13]  Y. Oyama,et al.  EXECUTING PARALLEL PROGRAMS WITH SYNCHRONIZATION BOTTLENECKS EFFICIENTLY , 1999 .

[14]  Nir Shavit,et al.  A Hierarchical CLH Queue Lock , 2006, Euro-Par.

[15]  William N. Scherer,et al.  Preemption Adaptivity in Time-Published Queue-Based Spin Locks , 2005, HiPC.

[16]  Erik Hagersten,et al.  Hierarchical backoff locks for nonuniform communication architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[17]  Nir Shavit,et al.  Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[18]  David Dice,et al.  Brief announcement: a partitioned ticket lock , 2011, SPAA '11.

[19]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[20]  Vivien Quéma,et al.  The Linux scheduler: a decade of wasted cores , 2016, EuroSys.

[21]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[22]  William N. Scherer,et al.  Scalable queue-based spin locks with timeout , 2001, PPoPP '01.

[23]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[25]  John M. Mellor-Crummey,et al.  Contention-conscious, locality-preserving locks , 2016, PPoPP.

[26]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[27]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[28]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[29]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[30]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[31]  Milind Chabbi,et al.  Be my guest: MCS lock now welcomes guests , 2016, PPoPP.