Run-DMC: Runtime dynamic heterogeneous multicore performance and power estimation for energy efficiency

In this paper we propose Run-DMC, an accurate runtime performance and power estimation scheme for dynamic workloads executing on heterogeneous multicore systems. In contrast to previous works, Run-DMC uses fine grain per-thread metrics that model the Thread Load Contribution (TLC) induced by the native OS scheduling policy to accurately predict performance and power for any possible thread-to-core mapping. This allows the operating system to opportunistically exploit the heterogeneous multicore architecture by dynamically mapping workloads to the most appropriate core type. We have integrated our models into the Linux kernel running on top of a heterogeneous multicore system with 4 different core types. Our experimental results show that Run-DMC models yield up to 97% more energy efficient when compared to the vanilla Linux. When compared to the approach employed by state-of-the-art energy-aware schedulers, Run-DMC yields up-to 44% better energy efficiency.

[1]  Anuj Pathania,et al.  Price theory based power management for heterogeneous multi-cores , 2014, ASPLOS.

[2]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[3]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  Mohamed Shalan,et al.  Energy-efficient task allocation techniques for asymmetric multiprocessor embedded systems , 2014, ACM Trans. Embed. Comput. Syst..

[5]  Ming Zhang,et al.  Where is the energy spent inside my app?: fine grained energy accounting on smartphones with Eprof , 2012, EuroSys '12.

[6]  Diana Marculescu,et al.  Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[7]  Nikil D. Dutt,et al.  SmartBalance: A sensing-driven linux load balancer for energy efficiency of heterogeneous MPSoCs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Israel Koren,et al.  An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[9]  Stefan M. Petters,et al.  Energy-aware partitioning of tasks onto a heterogeneous multi-core platform , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[10]  Myungsun Kim,et al.  Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[12]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[13]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[14]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[15]  Lothar Thiele,et al.  Dynamic Power-Aware Mapping of Applications onto Heterogeneous MPSoC Platforms , 2010, IEEE Transactions on Industrial Informatics.

[16]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[17]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[19]  Taewhan Kim,et al.  Timing variation-aware task scheduling and binding for MPSoC , 2009, 2009 Asia and South Pacific Design Automation Conference.

[20]  Bruce R. Childers,et al.  Program affinity performance models for performance and utilization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[22]  Vanchinathan Venkataramani,et al.  Power-performance modeling on asymmetric multi-cores , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[23]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[24]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[25]  Lu Peng,et al.  Lighting the dark silicon by exploiting heterogeneity on future processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).