Dynamic core scaling: Trading off performance and energy beyond DVFS

Dynamic voltage and frequency scaling (DVFS) is commonly employed on modern superscalar processors to reduce energy when peak performance is not needed or allowed. As technology scales, the effectiveness of DVFS is limited by the shrinking viable supply voltage range. This work proposes dynamic core scaling (DCS) to extend performance-energy tradeoff capabilities in superscalar processors. DCS ensures that programs run at a given percentage of their maximum speed and, at the same time, minimizes energy consumption by dynamically adjusting the active superscalar datapath resources. Evaluations using an 8-way superscalar processor implemented on 45nm circuit infrastructure show that DCS is more effective in performance-energy tradeoffs than DVFS at the high performance end. When used together with DVFS, DCS saves an additional 20% of a full-size core's energy on average. At the minimum operating voltage, DVFS stops reducing energy, while DCS is still able to achieve an average of 46% further energy reduction.

[1]  Gürhan Küçük,et al.  Dynamic resizing of superscalar datapath components for energy efficiency , 2006, IEEE Transactions on Computers.

[2]  Andreas Moshovos,et al.  Instruction flow-based front-end throttling for power-aware high-performance processors , 2001, ISLPED '01.

[3]  Stefanos Kaxiras,et al.  MLP-Aware Instruction Queue Resizing: The Key to Power-Efficient Performance , 2010, ARCS.

[4]  Wei Zhang,et al.  Adaptive front-end throttling for superscalar processors , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[5]  Takayasu Sakurai,et al.  Investigation of determinant factors of minimum operating voltage of logic gates in 65-nm CMOS , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[6]  Diana Marculescu,et al.  Power and performance evaluation of globally asynchronous locally synchronous processors , 2002, ISCA.

[7]  Hajime Shimada,et al.  Pipeline stage unification: a low-energy consumption technique for future mobile processors , 2003, ISLPED '03.

[8]  Tejas Karkhanis,et al.  Energy efficient co-adaptive instruction fetch and issue , 2003, ISCA '03.

[9]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[11]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[12]  Christine A. Shoemaker,et al.  Flicker: a dynamically adaptive architecture for power limited multicore systems , 2013, ISCA.

[13]  Avesta Sasan,et al.  Reducing Power in All Major CAM and SRAM-Based Processor Units via Centralized, Dynamic Resource Size Management , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Michael L. Scott,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, MICRO.

[15]  James E. Smith,et al.  Saving energy with just in time instruction delivery , 2002, ISLPED '02.

[16]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[17]  R. Iris Bahar,et al.  Fetch Halting on critical load misses , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[18]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[19]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[20]  Eric Rotenberg,et al.  A case for dynamic pipeline scaling , 2002, CASES '02.