TEI-Turbo: temperature effect inversion-aware turbo boost for finfet-based multi-core systems

Energy and temperature are the main constraints for modern high-performance multi-core systems. To save power or increase performance, Dynamic Voltage and Frequency Scaling (DVFS) is widely applied in industry. As CMOS technology continues scaling, FinFET has recently become the common choice for multi-core systems. In contrast with planar CMOS, FinFET is observed to have lower delay under higher temperature in super-threshold voltage region, an effect called temperature effect inversion (TEI). Due to this effect, performance can be further improved under power constraints. This work explores TEI-aware performance improvement for power limited multi-core systems. Our experimental results show that on average 15.70% throughput improvement can be achieved in steady state by a TEI-aware DVFS policy over a TEI-agnostic one. In further investigations, we observe multiple sweet spots in the operating voltage/frequency regions resulting from TEI effects. Based on these sweet spot operation regimes, this work introduces a fast algorithm which determines the maximum performance under power constraints. Experimental results confirm its effectiveness by exhibiting a speedup of an average of 45.9X in runtime while keeping resulting performance only 0.22% away from existing state-of-the-art algorithms.

[1]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[2]  A. Afzali-Kusha,et al.  Temperature dependence of propagation delay characteristic in FinFET circuits , 2008, 2008 International Conference on Microelectronics.

[3]  Emil Talpes,et al.  Variability and energy awareness: a microarchitecture-level perspective , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[4]  Diana Marculescu,et al.  Hardware based frequency/voltage control of voltage frequency island systems , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[5]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[6]  John Sartori,et al.  Three scalable approaches to improving many-core throughput for a given peak power budget , 2009, 2009 International Conference on High Performance Computing (HiPC).

[7]  Sang Lyul Min,et al.  Energy-centric DVFS controlling method for multi-core platforms , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[8]  Yu Cao,et al.  Exploring sub-20nm FinFET design with Predictive Technology Models , 2012, DAC Design Automation Conference 2012.

[9]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10]  Emil Talpes,et al.  Energy awareness and uncertainty in microarchitecture-level design , 2005, IEEE Micro.

[11]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[12]  Niraj K. Jha,et al.  CACTI-FinFET: An integrated delay and power modeling framework for FinFET-based caches under process variations , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[14]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[17]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Diana Marculescu Profile-driven code execution for low power dissipation (poster session) , 2000, ISLPED '00.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Diana Marculescu Profile-driven code execution for low power dissipation , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[21]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[23]  Diana Marculescu,et al.  Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors , 2012, ISLPED '12.

[24]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[25]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[26]  Diana Marculescu,et al.  Distributed reinforcement learning for power limited many-core system performance optimization , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  Shahin Nazarian,et al.  Dynamic thermal management for FinFET-based circuits exploiting the temperature effect inversion phenomenon , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[28]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[29]  C. Hu,et al.  Sub-50 nm P-channel FinFET , 2001 .

[30]  Diana Marculescu,et al.  Power Management of Voltage/Frequency Island-Based Systems Using Hardware-Based Methods , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[31]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[32]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  W. L. Bircher,et al.  Effective Use of Performance Monitoring Counters for Run-Time Prediction of Power , 2004 .

[34]  Naehyuck Chang,et al.  Energy-Optimal Dynamic Thermal Management: Computation and Cooling Power Co-Optimization , 2010, IEEE Transactions on Industrial Informatics.

[35]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[36]  James Charles,et al.  Evaluation of the Intel® Core™ i7 Turbo Boost feature , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[37]  Young Min Kim,et al.  Temperature Dependence of Substrate and Drain–Currents in Bulk FinFETs , 2007, IEEE Transactions on Electron Devices.