Multicore garbage collection with local heaps

In a parallel, shared-memory, language with a garbage collected heap, it is desirable for each processor to perform minor garbage collections independently. Although obvious, it is difficult to make this idea pay off in practice, especially in languages where mutation is common. We present several techniques that substantially improve the state of the art. We describe these techniques in the context of a full-scale implementation of Haskell, and demonstrate that our local-heap collector substantially improves scaling, peak performance, and robustness.

[1]  Damien Doligez,et al.  A concurrent, generational garbage collector for a multithreaded implementation of ML , 1993, POPL '93.

[2]  Pekka P. Pirinen,et al.  Barrier techniques for incremental tracing , 1998, ISMM '98.

[3]  Bjarne Steensgaard,et al.  Thread-specific heaps for multi-threaded programs , 2000, ISMM '00.

[4]  Erez Petrank,et al.  Thread-local heaps for Java , 2002, MSP/ISMM.

[5]  David L. Detlefs,et al.  Proceedings of the 3rd international symposium on Memory management , 2002 .

[6]  V. T. Rajan,et al.  A real-time garbage collector with low overhead and consistent utilization , 2003, POPL '03.

[7]  V. T. Rajan,et al.  A real-time garbage collector with low overhead and consistent utilization , 2003, POPL '03.

[8]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[9]  Simon L. Peyton Jones,et al.  Extending the Haskell foreign function interface with concurrency , 2004, Haskell '04.

[10]  Richard E. Jones,et al.  A fast analysis for thread-local garbage collection with dynamic class loading , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[11]  Simon L. Peyton Jones,et al.  Faster laziness using dynamic pointer tagging , 2007, ICFP '07.

[12]  Kathryn S. McKinley,et al.  Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance , 2008, PLDI '08.

[13]  John H. Reppy,et al.  Implicitly-threaded parallelism in Manticore , 2008, ICFP 2008.

[14]  Fridtjof Siebert Limits of parallel marking garbage collection , 2008, ISMM '08.

[15]  Simon L. Peyton Jones,et al.  Parallel generational-copying garbage collection with a block-structured heap , 2008, ISMM '08.

[16]  Simon L. Peyton Jones,et al.  Runtime support for multicore Haskell , 2009, ICFP.

[17]  Ling Shao,et al.  Allocation wall: a limiting factor of Java applications on emerging multi-core platforms , 2009, OOPSLA 2009.

[18]  Ling Shao,et al.  Allocation wall: a limiting factor of Java applications on emerging multi-core platforms , 2009, OOPSLA.

[19]  Todd A. Anderson Optimizations in a private nursery-based garbage collector , 2010, ISMM '10.

[20]  Lars Bergstrom,et al.  Lazy tree splitting , 2012, J. Funct. Program..