Contention-aware scheduler: unlocking execution parallelism in multithreaded java programs

In multithreaded programming, locks are frequently used as a mechanism for synchronization. Because today's operating systems do not consider lock usage as a scheduling criterion, scheduling decisions can be unfavorable to multithreaded applications, leading to performance issues such as convoying and heavy lock contention in systems with multiple processors. Previous efforts to address these issues (e.g., transactional memory, lock-free data structure) often treat scheduling decisions as "a fact of life," and therefore these solutions try to cope with the consequences of undesirable scheduling instead of dealing with the problem directly. In this paper, we introduce Contention-Aware Scheduler (CA-Scheduler), which is designed to support efficient execution of large multithreaded Java applications in multiprocessor systems. Our proposed scheduler employs a scheduling policy that reduces lock contention. As will be shown in this paper, our prototype implementation of the CA-Scheduler in Linux and Sun HotSpot virtual machine only incurs 3.5% runtime overhead, while the overall performance differences, when compared with a system with no contention awareness, range from a degradation of 3% in a small multithreaded benchmark to an improvement of 15% in a large Java application server benchmark.

[1]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[2]  Kunle Olukotun,et al.  Transactional Execution of Java Programs , 2005 .

[3]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[4]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[5]  Rajiv Arora,et al.  Java server performance: A case study of building efficient, scalable Jvms , 2000, IBM Syst. J..

[6]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[7]  David A. Patterson,et al.  Computer organization and design (2nd ed.): the hardware/software interface , 1997 .

[8]  Donald E. Porter,et al.  TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.

[9]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[11]  Emery D. Berger,et al.  CRAMM: virtual memory support for garbage-collected applications , 2006, OSDI '06.

[12]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[13]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[14]  Mauricio J. Serrano,et al.  Thin locks: featherweight Synchronization for Java , 2004, SIGP.

[15]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[16]  Liu Rui,et al.  Fuzzy c-Means Clustering Algorithm , 2008 .

[17]  Mark Plesko,et al.  Optimizing memory transactions , 2006, PLDI '06.

[18]  Gavin Brown,et al.  Intelligent selection of application-specific garbage collectors , 2007, ISMM '07.

[19]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[20]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[21]  Chandra Krintz,et al.  Isla Vista Heap Sizing: Using Feedback to Avoid Paging , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[22]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[23]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[24]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[25]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[26]  Adam Welc,et al.  Improving virtual machine performance using a cross-run profile repository , 2005, OOPSLA '05.

[27]  Witawas Srisa-an,et al.  Allocation-phase aware thread scheduling policies to improve garbage collection performance , 2007, ISMM '07.

[28]  Eduard Ayguadé,et al.  Transactional Memory: An Overview , 2007, IEEE Micro.

[29]  Volkmar Uhlig,et al.  The mechanics of in-kernel synchronization for a scalable microkernel , 2007, OPSR.

[30]  Abraham Silberschatz,et al.  Operating System Concepts: Desktop Edition , 2007 .

[31]  Chandra Krintz,et al.  Dynamic selection of application-specific garbage collectors , 2004, ISMM '04.

[32]  Myra B. Cohen,et al.  Clustering the heap in multi-threaded applications for improved garbage collection , 2006, GECCO.

[33]  Abraham Silberschatz,et al.  Operating System Concepts, 5th Edition , 1994 .

[34]  Ravi Rajwar,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[35]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[36]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .