SmartStealing: Analysis and Optimization of Work Stealing in Parallel Garbage Collection for Java VM

Parallel garbage collection has been used to speedup the collection process on multicore architectures. Similar to other parallel techniques, balancing the workload among threads is critical to ensuring good overall collection performance. To this end, work stealing is employed by the current state-of-the-art Java Virtual Machine, OpenJDK, to keep GC threads from idling during a collection process. However, we found that the current algorithm is not efficient. Its usage can often cause GC performance to be worse than when work stealing is not used. In this paper, we identify three factors that affect work stealing efficiency: determining tasks that can benefit from stealing, frequency with which to attempt stealing, and performance impacts of failed stealing attempts. Based on this analysis, we propose SmartStealing, a new algorithm that can automatically decide whether to attempt stealing at a particular point during execution. If stealing is attempted, it can efficiently identify a task to steal from. We then compare the collection performances when (i) the default work stealing algorithm is used, (ii) work stealing is not used at all, and (iii) the SmartStealing approach is used. Without modifying the remaining garbage collection system, the evaluation result shows that SmartStealing can reduce the parallel GC execution time for 19 of the 21 benchmarks. The average reduction is 50.4% and the highest reduction is 78.7%. We also investigate the performances of SmartStealing on NUMA and UMA architectures.

[1]  Jeffrey K. Hollingsworth,et al.  NUMA-aware Java heaps for server applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[2]  Witawas Srisa-an,et al.  Factors affecting scalability of multithreaded Java applications on manycore systems , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Marc Shapiro,et al.  Assessing the scalability of garbage collectors on many cores , 2011, PLOS '11.

[4]  J. Morris Chang,et al.  Multithreading in Java: Performance and Scalability on Multicore Systems , 2011, IEEE Transactions on Computers.

[5]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Marc Shapiro,et al.  A study of the scalability of stop-the-world garbage collectors on multicores , 2013, ASPLOS '13.

[7]  Stephen M. Watt,et al.  A new approach to parallelising tracing algorithms , 2009, ISMM '09.

[8]  Michael Wolf,et al.  Scalable concurrent and parallel mark , 2012, ISMM '12.

[9]  Witawas Srisa-an,et al.  AS-GC: An Efficient Generational Garbage Collector for Java Application Servers , 2007, ECOOP.

[10]  Nhan Nguyen,et al.  NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines , 2015, ASPLOS.

[11]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[12]  Xi Yang,et al.  Looking back on the language and hardware revolutions: measured power, performance, and scaling , 2011, ASPLOS XVI.

[13]  Erik Helin Improving Load Balancing during the Marking Phase of Garbage Collection. , 2012 .

[14]  Maged M. Michael,et al.  Idempotent work stealing , 2009, PPoPP '09.

[15]  Xiao-Feng Li,et al.  Task-pushing: a Scalable Parallel GC Marking Algorithm without Synchronization Operations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[17]  Witawas Srisa-an,et al.  Microphase: an approach to proactively invoking garbage collection for improved performance , 2007, OOPSLA.

[18]  Ravi Iyer,et al.  Addressing Cache/Memory Overheads in Enterprise Java CMP Servers , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[19]  David Detlefs,et al.  Garbage-first garbage collection , 2004, ISMM '04.

[20]  Thomas R. Gross,et al.  Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead , 2011, ISMM '11.

[21]  Nir Shavit,et al.  Work dealing , 2002, SPAA '02.

[22]  Witawas Srisa-an,et al.  Allocation-phase aware thread scheduling policies to improve garbage collection performance , 2007, ISMM '07.

[23]  Stijn Eyerman,et al.  Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications , 2013, OOPSLA.

[24]  Takeshi Ogasawara NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA.

[25]  Simon L. Peyton Jones,et al.  Multicore garbage collection with local heaps , 2011, ISMM '11.

[26]  Michael Jones,et al.  Exploring Small-Scale and Large-Scale CMP Architectures for Commercial Java Servers , 2006, 2006 IEEE International Symposium on Workload Characterization.