Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors

Recently, semiconductor industries have integrated more cores in a single die, which substantially improves the throughput of the processors running highly-parallel applications. However, many existing applications do not have high enough parallelism to exploit multiple cores in a die, slowing the transition to many-core processors with smaller and more cores that benefit future applications with high parallelism. In this paper, we analyze the impact of multiple adaptive voltage scaling (AVS) and adaptive body biasing (ABB) domains on the throughput of power and thermal-constrained multi-core processors when they are combined with per-core power-gating (PCPG). Both AVS and ABB can be effectively used to either increase frequency (thus throughput) or decrease power consumption of the processors. Meanwhile, PCPG can provide extra power and thermal headroom when application's parallelism is limited. First, we analyze the throughput impact of applying AVS, ABB, and PCPG for power and thermal constrained multi-core processors. Second, we investigate the impact of multiple AVS and ABB domains on the throughput, and recommend the most cost-effective number of domains for AVS and ABB in 16 and 8-core processors. Our analysis using the 32nm predictive technology model considering within-die variations suggests that the most cost-effective number of domains for AVS and/or ABB should be one for each when they are combined with PCPG in both 16 and 8-core processors. Since within-die core-to-core variations provide many choices in terms of core frequency and power consumption for limited-parallelism applications, one AVS or ABB domain can leads to the throughput improvement by 1.77∼2.49x; more than one AVS and/or ABB domains only improve the throughput marginally.

[1]  S. Narendra,et al.  1.1 V 1 GHz communications router with on-chip body bias in 150 nm CMOS , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[2]  Margaret Martonosi,et al.  Power Efficiency for Variation-Tolerant Multicore Processors , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[3]  V. De,et al.  Statistical design for variation tolerance: key to continued Moore's law , 2004, 2004 International Conference on Integrated Circuit Design and Technology (IEEE Cat. No.04EX866).

[4]  Nam Sung Kim,et al.  Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[5]  Keith A. Bowman,et al.  Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[6]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[7]  Diana Marculescu,et al.  Characterizing chip-multiprocessor variability-tolerance , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[8]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[9]  Thomas R. Puzak,et al.  Optimum power/performance pipeline depth , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[10]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[11]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[12]  Josep Torrellas,et al.  Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13]  Josep Torrellas,et al.  Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[14]  Chaitali Chakrabarti,et al.  Throughput of multi-core processors under thermal constraints , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[15]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[16]  Hsien-Hsin S. Lee,et al.  Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[17]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[18]  Trevor N. Mudge,et al.  Total power-optimal pipelining and parallel processing under process variations in nanometer technology , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[19]  Keith A. Bowman,et al.  Measurements and modeling of intrinsic fluctuations in MOSFET threshold voltage , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[20]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[21]  Allan Hartstein,et al.  Optimum Power/Performance Pipeline Depth , 2003, MICRO.

[22]  Krste Asanovic,et al.  Power-optimal pipelining in deep submicron technology , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).