Locking Made Easy

A priori, locking seems easy: To protect shared data from concurrent accesses, it is sufficient to lock before accessing the data and unlock after. Nevertheless, making locking efficient requires fine-tuning (a) the granularity of locks and (b) the locking strategy for each lock and possibly each workload. As a result, locking can become very complicated to design and debug. We present GLS, a middleware that makes lock-based programming simple and effective. GLS offers the classic lock-unlock interface of locks. However, in contrast to classic lock libraries, GLS does not require any effort from the programmer for allocating and initializing locks, nor for selecting the appropriate locking strategy. With GLS, all these intricacies of locking are hidden from the programmer. GLS is based on GLK, a generic lock algorithm that dynamically adapts to the contention level on the lock object. GLK is able to deliver the best performance among simple spinlocks, scalable queue-based locks, and blocking locks. Furthermore, GLS offers several debugging options for easily detecting various lock-related issues, such as deadlocks. We evaluate GLS and GLK on two modern hardware platforms, using several software systems (i.e., HamsterDB, Kyoto Cabinet, Memcached, MySQL, SQLite) and show how GLK improves their performance by 23% on average, compared to their default locking strategies. We illustrate the simplicity of using GLS and its debugging facilities by rewriting the synchronization code for Memcached and detecting two potential correctness issues.

[1]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[2]  Roberto Palmieri,et al.  On designing NUMA-aware concurrency control for scalable transactional memory , 2016, PPOPP.

[3]  Rachid Guerraoui,et al.  Unlocking Energy , 2016, USENIX Annual Technical Conference.

[4]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[5]  Mauricio J. Serrano,et al.  Thin locks: featherweight synchronization for Java , 1998, PLDI '98.

[6]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[7]  Jim Dowling,et al.  Message-Passing Concurrency for Scalable, Stateful, Reconfigurable Middleware , 2012, Middleware.

[8]  Michael L. Scott,et al.  Software partitioning of hardware transactions , 2015, PPoPP.

[9]  Peter Triantafillou An Approach to Deadlock Detection in Multidatabases , 1997, Inf. Syst..

[10]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[11]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[12]  Nir Shavit,et al.  A Hierarchical CLH Queue Lock , 2006, Euro-Par.

[13]  Nir Shavit,et al.  Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[14]  Haibo Chen,et al.  Scalable adaptive NUMA-aware lock: combining local locking and remote locking for efficient concurrency , 2016, PPOPP.

[15]  William N. Scherer,et al.  Preemption Adaptivity in Time-Published Queue-Based Spin Locks , 2005, HiPC.

[16]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[17]  Rachid Guerraoui,et al.  Why STM can be more than a research toy , 2011, Commun. ACM.

[18]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[19]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[20]  Corporate Ieee,et al.  Information Technology-Portable Operating System Interface , 1990 .

[21]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[22]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[23]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[24]  Vivien Quéma,et al.  Multicore Locks: The Case Is Not Closed Yet , 2016, USENIX Annual Technical Conference.

[25]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[26]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[27]  Pascal Felber,et al.  Hardware read-write lock elision , 2016, EuroSys.

[28]  Beng-Hong Lim,et al.  Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.

[29]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[30]  Maurice Herlihy,et al.  Dreadlocks: efficient deadlock detection , 2008, SPAA '08.

[31]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[32]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[33]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[34]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[35]  David Gay,et al.  Effective static deadlock detection , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[36]  Murali Krishna Ramanathan,et al.  Trace driven dynamic deadlock detection and reproduction , 2014, PPoPP.

[37]  Hari K. Pyla,et al.  Avoiding deadlock avoidance , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[38]  Anastasia Ailamaki,et al.  Decoupling contention management from scheduling , 2010, ASPLOS 2010.

[39]  Amitabha Roy,et al.  A runtime system for software lock elision , 2009, EuroSys '09.

[40]  Michael D. Ernst,et al.  Static Deadlock Detection for Java Libraries , 2005, ECOOP.

[41]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[42]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[43]  Martin C. Rinard,et al.  Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback , 1999, TOCS.

[44]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[45]  Tudor David,et al.  Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures , 2015, ASPLOS.

[46]  Nir Shavit,et al.  Read-log-update: a lightweight synchronization mechanism for concurrent programming , 2015, SOSP.

[47]  John M. Mellor-Crummey,et al.  Contention-conscious, locality-preserving locks , 2016, PPoPP.

[48]  Mauricio J. Serrano,et al.  Thin locks: featherweight Synchronization for Java , 2004, SIGP.