Efficient nursery sizing for managed languages on multi-core processors with shared caches

In modern programming languages, automatic memory management has become a standard feature for allocating and freeing memory. In this paper, we show that the performance of today’s managed languages can degrade significantly due to cache contention among multiple concurrent applications that share a cache. To address this problem, we propose to change the programs’ memory access patterns by adjusting the nursery size. We propose Dynamic Nursery Allocator (DNA), an online dynamic scheme that automatically adjusts the nursery sizes of multiple managed-language programs running concurrently without any prior knowledge or offline profiling. The experimental results on a native Intel machine show that DNA can significantly improve the system throughput by 16.3% on average and as much as 73% over today’s nursery sizing scheme when four applications run concurrently sharing the last-level cache.

[1]  Damien Doligez,et al.  A concurrent, generational garbage collector for a multithreaded implementation of ML , 1993, POPL '93.

[2]  Xi Yang,et al.  Why nothing matters: the impact of zeroing , 2011, OOPSLA '11.

[3]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.

[4]  Simon L. Peyton Jones,et al.  Multicore garbage collection with local heaps , 2011, ISMM '11.

[5]  Andrew W. Appel,et al.  Garbage Collection can be Faster than Stack Allocation , 1987, Inf. Process. Lett..

[6]  David Vengerov,et al.  Modeling, analysis and throughput optimization of a generational garbage collector , 2009, ISMM '09.

[7]  Lars Bergstrom,et al.  Garbage collection for multicore NUMA machines , 2011, MSPC '11.

[8]  Shiliang Hu,et al.  Remix: online detection and repair of cache contention for the JVM , 2016, PLDI.

[9]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[10]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[11]  Takeshi Ogasawara Scalability limitations when running a Java web server on a chip multiprocessor , 2010, SYSTOR '10.

[12]  José Simão,et al.  VM Economics for Java Cloud Computing: An Adaptive and Resource-Aware Java Runtime with Quality-of-Execution , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[13]  Paul R. Wilson,et al.  Caching considerations for generational garbage collection , 1992, LFP '92.

[14]  Henry Lieberman,et al.  A real-time garbage collector based on the lifetimes of objects , 1983, CACM.

[15]  Jeremy Singer,et al.  The judgment of forseti: economic utility for dynamic heap sizing of multiple runtimes , 2015, ISMM.

[16]  Emery D. Berger,et al.  Garbage collection without paging , 2005, PLDI '05.

[17]  Sanath Jayasena,et al.  Auto-Tuning the Java Virtual Machine , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[18]  Ling Shao,et al.  Allocation wall: a limiting factor of Java applications on emerging multi-core platforms , 2009, OOPSLA.

[19]  G. Edward Suh,et al.  Hardware-software co-optimization of memory management in dynamic languages , 2018, ISMM.

[20]  Michael Franz,et al.  Automated data-member layout of heap objects to improve memory-hierarchy performance , 2000, TOPL.

[21]  Andrew W. Appel,et al.  An advisor for flexible working sets , 1990, SIGMETRICS '90.

[22]  Jin Zhou,et al.  Memory management for many-core processors with software configurable locality policies , 2012, ISMM '12.

[23]  Todd A. Anderson Optimizations in a private nursery-based garbage collector , 2010, ISMM '10.

[24]  Hanspeter Mössenböck,et al.  The taming of the shrew: increasing performance by automatic parameter tuning for java garbage collectors , 2014, ICPE.

[25]  Perry Cheng,et al.  The garbage collection advantage: improving program locality , 2004, OOPSLA.

[26]  David R. White,et al.  Control theory for principled heap sizing , 2013, ISMM '13.

[27]  Lieven Eeckhout,et al.  Cooperative cache scrubbing , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[28]  Guy E. Blelloch,et al.  Hierarchical memory management for parallel programs , 2016, ICFP.

[29]  Vimal K. Reddy,et al.  A Cache-Pinning Strategy for Improving Generational Garbage Collection , 2006, HiPC.

[30]  G. Edward Suh,et al.  Quantitative Overhead Analysis for Python , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[31]  Xi He,et al.  An equation-based Heap Sizing Rule , 2013, Perform. Evaluation.

[32]  Gavin Brown,et al.  Garbage collection auto-tuning for Java mapreduce on multi-cores , 2011, ISMM '11.

[33]  Gavin Brown,et al.  The economics of garbage collection , 2010, ISMM '10.

[34]  Martin Hirzel,et al.  Data layouts for object-oriented programs , 2007, SIGMETRICS '07.

[35]  Marc Shapiro,et al.  A study of the scalability of stop-the-world garbage collectors on multicores , 2013, ASPLOS '13.

[36]  Chen Ding,et al.  Waste not, want not: resource-based garbage collection in a shared environment , 2011, ISMM '11.

[37]  Nhan Nguyen,et al.  NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines , 2015, ASPLOS.