Boosting the Priority of Garbage

While hardware is evolving toward heterogeneous multicore architectures, modern software applications are increasingly written in managed languages. Heterogeneity was born of a need to improve energy efficiency; however, we want the performance of our applications not to suffer from limited resources. How best to schedule managed language applications on a mix of big, out-of-order cores and small, in-order cores is an open question, complicated by the host of service threads that perform key tasks such as memory management. These service threads compete with the application for core and memory resources, and garbage collection (GC) must sometimes suspend the application if there is not enough memory available for allocation. In this article, we explore concurrent garbage collection’s behavior, particularly when it becomes critical, and how to schedule it on a heterogeneous system to optimize application performance. While some applications see no difference in performance when GC threads are run on big versus small cores, others—those with GC criticality—see up to an 18% performance improvement. We develop a new, adaptive scheduling algorithm that responds to GC criticality signals from the managed runtime, giving more big-core cycles to the concurrent collector when it is under pressure and in danger of suspending the application. Our experimental results show that our GC-criticality-aware scheduler is robust across a range of heterogeneous architectures with different core counts and frequency scaling and across heap sizes. Our algorithm is performance and energy neutral for GC-uncritical Java applications and significantly speeds up GC-critical applications by 16%, on average, while being 20% more energy efficient for a heterogeneous multicore with three big cores and one small core.

[1]  Lieven Eeckhout,et al.  Exploring multi-threaded Java application performance on multicore hardware , 2012, OOPSLA '12.

[2]  Ling Shao,et al.  Allocation wall: a limiting factor of Java applications on emerging multi-core platforms , 2009, OOPSLA.

[3]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[4]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[5]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[6]  Ting Cao,et al.  The Yin and Yang of power and performance for asymmetric hardware and managed software , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[7]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[8]  Xi Yang,et al.  Why nothing matters: the impact of zeroing , 2011, OOPSLA '11.

[9]  Lieven Eeckhout,et al.  Cooperative cache scrubbing , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[10]  Sadagopan Srinivasan,et al.  Efficient interaction between OS and architecture in heterogeneous platforms , 2011, OPSR.

[11]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[12]  Taiichi Yuasa,et al.  Real-time garbage collection on general-purpose machines , 1990, J. Syst. Softw..

[13]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14]  Perry Cheng,et al.  The garbage collection advantage: improving program locality , 2004, OOPSLA.

[15]  Hyesoon Kim,et al.  Age based scheduling for asymmetric multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[17]  Cliff Click Java on 1000 Cores: Tales of Hardware/Software Co-design , 2009, ECOOP.

[18]  Kathryn S. McKinley,et al.  Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance , 2008, PLDI '08.

[19]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[20]  Stijn Eyerman,et al.  Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications , 2013, OOPSLA.

[21]  Soraya Ghiasi,et al.  Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[22]  Tong Li,et al.  Operating system support for overlapping-ISA heterogeneous multi-core architectures , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[23]  Xi Yang,et al.  Looking back on the language and hardware revolutions: measured power, performance, and scaling , 2011, ASPLOS XVI.

[24]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[25]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[26]  John Kubiatowicz,et al.  GPUs as an opportunity for offloading garbage collection , 2012, ISMM '12.

[27]  Lizy Kurian John,et al.  Impact of virtual execution environments on processor energy consumption and hardware adaptation , 2006, VEE '06.

[28]  Onur Mutlu,et al.  Flexible reference-counting-based hardware acceleration for garbage collection , 2009, ISCA '09.

[29]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[30]  Stephen M. Blackburn,et al.  Barriers: friend or foe? , 2004, ISMM '04.

[31]  James E. Smith,et al.  Concurrent garbage collection using hardware-assisted profiling , 2000, ISMM '00.

[32]  Amer Diwan,et al.  Wake up and smell the coffee: evaluation methodology for the 21st century , 2008, CACM.

[33]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[34]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[35]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[36]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[37]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[38]  Lieven Eeckhout,et al.  Power-aware multi-core simulation for early design stage hardware/software co-optimization , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[39]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[40]  Kelvin D. Nilsen,et al.  Performance of a hardware-assisted real-time garbage collector , 1994, ASPLOS VI.

[41]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.