论文信息 - Reducing Energy per Instruction via Dynamic Resource Allocation and Voltage and Frequency Adaptation in Asymmetric Multicores

Reducing Energy per Instruction via Dynamic Resource Allocation and Voltage and Frequency Adaptation in Asymmetric Multicores

With the advent of multicore processors the emphasis incomputation is moving from sequential to parallel processing. Still, applications that require strong sequential performance do not achieve their highest performance/power when executing on current multicoresystems. As the computational needs vary significantly across different applications and with time, there is a need to dynamically allocate appropriate computational resources on demand to suit the applications' current needs, in order to minimize the energy consumption. The Energy per Instruction (EPI) could be further decreased by dynamically adapting the voltage and frequency to better fit the changing characteristics of the workload. Not only can a core be forced to a low power mode when its activity level is low, but the power saved by doing so could be opportunistically re-budgeted to other cores to boost the overall system throughput. To this end, we propose a holistic solution to energy efficiency improvement by seamlessly combining heterogeneity, Dynamic ResourceAllocation (DRA) and Dynamic Voltage and Frequency Adaptation (DVFA) capabilities to adapt the core resources to the changing demands of applications. Our results show that the proposed scheme provides anEPI reduction of about 17.9% when compared to the baseline heterogeneous multicore, 14% when compared to the baseline heterogeneous multicore with DVFA only and about 16.5% when compared to the baseline heterogeneous multicore with DRA only.

[1] Maurice Steinman,et al. AMD'S "LLANO" Fusion APU , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[2] Brad Calder,et al. Phase tracking and prediction , 2003, ISCA '03.

[3] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008 .

[4] Omer Khan,et al. Thread Relocation: A Runtime Architecture for Tolerating Hard Errors in Chip Multiprocessors , 2010, IEEE Transactions on Computers.

[5] Stefanos Kaxiras,et al. Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.

[6] Israel Koren,et al. A study on performance benefits of core morphing in an asymmetric multicore processor , 2010, 2010 IEEE International Conference on Computer Design.

[7] Omer Khan,et al. A model to exploit power-performance efficiency in superscalar processors via structure resizing , 2010, GLSVLSI '10.

[8] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9] Israel Koren,et al. Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[10] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11] S. Winkel. Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[12] Israel Koren,et al. Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[13] Xiao Zhang,et al. An Evaluation of Per-Chip Nonuniform Frequency Scaling on Multicores , 2010, USENIX Annual Technical Conference.

[14] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[15] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[16] Massoud Pedram,et al. Minimizing energy consumption of a chip multiprocessor through simultaneous core consolidation and DVFS , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[17] Milos D. Ercegovac,et al. The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18] Meeta Sharma Gupta,et al. System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[19] Christine A. Shoemaker,et al. Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[21] Francisco J. Cazorla,et al. A Flexible Heterogeneous Multi-Core Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[22] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[23] Craig B. Zilles,et al. Fundamental performance constraints in horizontal fusion of in-order cores , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[24] Alon Naveh,et al. Power management architecture of the 2nd generation Intel® Core microarchitecture, formerly codenamed Sandy Bridge , 2011, IEEE Hot Chips Symposium.