Reducing Scalability Collapse via Requester-Based Locking on Multicore Systems

In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e. scalability collapse). Existing lock implementations have disadvantages in scalability, resource utilization and energy efficiency. In this work, we observe that the number of tasks requesting a lock has a significant correlation with the occurrence of scalability collapse. Based on this observation, we propose a novel lock implementation that allows tasks blocked on a lock to either spin or maintain a power-saving state according to the number of lock requesters. We call our lock implementation protocol a requester-based lock and implement it in the Linux kernel to replace its default spin lock. Based on the results of an analysis, we find that the best policy for a task waiting for a lock to become free is to enter the power saving state immediately after noticing that the lock cannot be acquired. Our lock-requester based lock scheme is evaluated using micro- and macro-benchmarks on AMD 32-core and Intel 40-core systems. Experimental results indicate our lock scheme removes scalability collapse completely for most applications. Furthermore, our method shows better scalability and energy efficiency than mutex locks and adaptive locks.

[1]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[2]  Witawas Srisa-an,et al.  Contention-aware scheduler: unlocking execution parallelism in multithreaded java programs , 2008, OOPSLA.

[3]  Nectarios Koziris,et al.  Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  Yan Cui,et al.  A Scheduling Method for Avoiding Kernel Lock Thrashing on Multi-cores , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[5]  Erik Hagersten,et al.  Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[6]  Alek Vainshtein,et al.  Optimal Strategies for Spinning and Blocking , 1994, J. Parallel Distributed Comput..

[7]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[8]  George Neville-Neil,et al.  The Design and Implementation of the FreeBSD Operating System , 2014 .

[9]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[10]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[11]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[12]  Richard McDougall,et al.  Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture , 2006 .

[13]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[14]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[15]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[16]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[17]  Surendar Chandra,et al.  Thread Migration to Improve Synchronization Performance , 2006 .

[18]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.