Eliminating read barriers through procrastination and cleanliness

Managed languages typically use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a multicore environment with thread-local heaps and a global, shared heap, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established in the original object to point to the copied object. This level of indirection avoids the need to update all of the references to the object that has been copied. In this paper, we consider the design of a managed runtime that eliminates read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary -- lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incoming references is known, or when it can be determined that having multiple copies would not violate program semantics. We evaluate our techniques on 3 platforms: a 16-core AMD64 machine, a 48-core Intel SCC, and an 864-core Azul Vega 3. Experimental results over a range of parallel benchmarks indicate that our approach leads to notable performance gains (20 - 32% on average) without incurring any additional complexity.

[1]  Damien Doligez,et al.  A concurrent, generational garbage collector for a multithreaded implementation of ML , 1993, POPL '93.

[2]  S. L. Graham,et al.  List Processing in Real Time on a Serial Computer , 1978 .

[3]  Robin Milner,et al.  Definition of standard ML , 1990 .

[4]  Simon L. Peyton Jones,et al.  Multicore garbage collection with local heaps , 2011, ISMM '11.

[5]  V. T. Rajan,et al.  A real-time garbage collector with low overhead and consistent utilization , 2003, POPL '03.

[6]  Bjarne Steensgaard,et al.  Thread-specific heaps for multi-threaded programs , 2000, ISMM '00.

[7]  Lars Bergstrom,et al.  Garbage collection for multicore NUMA machines , 2011, MSPC '11.

[8]  Pieter H. Hartel,et al.  Benchmarking implementations of lazy functional languages , 1993, FPCA '93.

[9]  Rafael Dueire Lins,et al.  Benchmarking implementations of functional languages with ‘Pseudoknot’, a float-intensive benchmark , 1996, Journal of Functional Programming.

[10]  Andrew W. Appel,et al.  Simple generational garbage collection and fast allocation , 1989, Softw. Pract. Exp..

[11]  Rodney A. Brooks,et al.  Trading data space for reduced time and code space in real-time garbage collection on stock hardware , 1984, LFP '84.

[12]  John H. Reppy,et al.  Concurrent programming in ML , 1999 .

[13]  Todd A. Anderson Optimizations in a private nursery-based garbage collector , 2010, ISMM '10.

[14]  Stephen M. Blackburn,et al.  Barriers: friend or foe? , 2004, ISMM '04.

[15]  Marc Shapiro,et al.  Assessing the scalability of garbage collectors on many cores , 2011, PLOS '11.

[16]  Guy L. Steele,et al.  Multiprocessing compactifying garbage collection , 1975, CACM.

[17]  Fridtjof Siebert Limits of parallel marking garbage collection , 2008, ISMM '08.

[18]  Richard E. Jones,et al.  A fast analysis for thread-local garbage collection with dynamic class loading , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[19]  Suresh Jagannathan,et al.  Composable asynchronous events , 2011, PLDI '11.