A study of the scalability of stop-the-world garbage collectors on multicores

Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond approximately 8 cores on a 48-core with OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48~cores.

[1]  David M. Ungar,et al.  Generation Scavenging: A non-disruptive high performance storage reclamation algorithm , 1984, SDE 1.

[2]  David Detlefs,et al.  Garbage-first garbage collection , 2004, ISMM '04.

[3]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[4]  Jeffrey K. Hollingsworth,et al.  NUMA-aware Java heaps for server applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[6]  Marc Shapiro,et al.  Assessing the scalability of garbage collectors on many cores , 2011, PLOS '11.

[7]  Henry Lieberman,et al.  A real-time garbage collector based on the lifetimes of objects , 1983, CACM.

[8]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[9]  Richard E. Jones,et al.  The Garbage Collection Handbook: The art of automatic memory management , 2011, Chapman and Hall / CRC Applied Algorithms and Data Structures Series.

[10]  Michael Wolf,et al.  The Collie: a wait-free compacting collector , 2012, ISMM '12.

[11]  Michael Wolf,et al.  C4: the continuously concurrent compacting collector , 2011, ISMM '11.

[12]  Kathryn S. McKinley,et al.  Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance , 2008, PLDI '08.

[13]  Stephen M. Watt,et al.  A new approach to parallelising tracing algorithms , 2009, ISMM '09.

[14]  Jin Zhou,et al.  Memory management for many-core processors with software configurable locality policies , 2012, ISMM '12.

[15]  Takeshi Ogasawara NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA 2009.

[16]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[17]  Vivien Quéma,et al.  Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.

[18]  Todd A. Anderson Optimizations in a private nursery-based garbage collector , 2010, ISMM '10.

[19]  T. J. Watson,et al.  Fuss , Futexes and Furwocks : Fast Userlevel Locking in Linux Hubertus Franke IBM , 2005 .

[20]  Bjarne Steensgaard,et al.  Thread-specific heaps for multi-threaded programs , 2000, ISMM '00.

[21]  Jan Vitek,et al.  Schism: fragmentation-tolerant real-time garbage collection , 2010, PLDI '10.

[22]  Simon L. Peyton Jones,et al.  Parallel generational-copying garbage collection with a block-structured heap , 2008, ISMM '08.

[23]  Suresh Jagannathan,et al.  Eliminating read barriers through procrastination and cleanliness , 2012, ISMM '12.

[24]  Takeshi Ogasawara NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA.

[25]  Filip Pizlo,et al.  Stopless: a real-time garbage collector for multiprocessors , 2007, ISMM '07.

[26]  Andrew W. Appel,et al.  Simple generational garbage collection and fast allocation , 1989, Softw. Pract. Exp..

[27]  Damien Doligez,et al.  A concurrent, generational garbage collector for a multithreaded implementation of ML , 1993, POPL '93.

[28]  Simon L. Peyton Jones,et al.  Multicore garbage collection with local heaps , 2011, ISMM '11.