An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs

The importance of dynamic thread scheduling is increasing with the emergence of Asymmetric Multicore Processors (AMPs). Since the computing needs of a thread often vary during its execution, a fixed thread-to-core assignment is sub-optimal. Reassigning threads to cores (thread swapping) when the threads start a new phase with different computational needs, can significantly improve the energy efficiency of AMPs. Although identifying phase changes in the threads is not difficult, determining the appropriate thread-to-core assignment is a challenge. Furthermore, the problem of thread reassignment is aggravated by the multiple power states that may be available in the cores. To this end, we propose a novel technique to dynamically assess the program phase needs and determine whether swapping threads between core-types and/or changing the voltage/frequency levels (DVFS) of the cores will result in higher throughput/Watt. This is achieved by predicting the expected throughput/Watt of the current program phase at different voltage/frequency levels on all the available core-types in the AMP. We show that the benefits from thread swapping and DVFS are orthogonal, demonstrating the potential of the proposed scheme to achieve significant benefits by seamlessly combining the two. We illustrate our approach using a dual-core High-Performance (HP)/Low-Power (LP) AMP with two power states and demonstrate significant throughput/Watt improvement over different baselines.

[1]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[2]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Israel Koren,et al.  Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Lisa Cranton Heller,et al.  Millicode in an IBM zSeries processor , 2004, IBM J. Res. Dev..

[5]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Israel Koren,et al.  Scalable Thread Scheduling in Asymmetric Multicores for Power Efficiency , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[7]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[8]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[9]  Francisco J. Cazorla,et al.  A Flexible Heterogeneous Multi-Core Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[10]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[11]  Israel Koren,et al.  Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[13]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[14]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[15]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[16]  Naehyuck Chang,et al.  Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[17]  Omer Khan,et al.  Microvisor: A Runtime Architecture for Thermal Management in Chip Multiprocessors , 2011, Trans. High Perform. Embed. Archit. Compil..

[18]  Omer Khan,et al.  A self-adaptive scheduler for asymmetric multi-cores , 2010, GLSVLSI '10.

[19]  Sadagopan Srinivasan,et al.  Efficient interaction between OS and architecture in heterogeneous platforms , 2011, OPSR.

[20]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[21]  Israel Koren,et al.  Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing , 2013, TODE.

[22]  Stefanos Kaxiras,et al.  Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.

[23]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[24]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[25]  Maurice Steinman,et al.  AMD'S "LLANO" Fusion APU , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[26]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.