Opportunistic Turbo Execution in NTC: Exploiting the paradigm shift in performance bottlenecks

In this paper, we investigate an intriguing shifting trend in performance bottlenecks for Near-Threshold Computing (NTC) processors. Our study demonstrates that the traditional memory latency bottleneck is largely superseded by the bottlenecks of Long Latency Datapaths (LLDs) within a processor core. To exploit this paradigm shift, we propose Opportunistic Turbo Execution (OTE). OTE dynamically boosts the performance of LLDs, by several factors, improving both performance and energy efficiency in an NTC core. Using a comprehensive circuit-architectural analysis, we demonstrate a 42.2% improvement in energy efficiency over a recently proposed technique, across a range of benchmarks.

[1]  A.P. Chandrakasan,et al.  A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation , 2007, IEEE Journal of Solid-State Circuits.

[2]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[3]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[4]  Takayasu Sakurai,et al.  Misleading energy and performance claims in sub/near threshold digital systems , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Jan M. Rabaey,et al.  Ultralow-Power Design in Near-Threshold Region , 2010, Proceedings of the IEEE.

[6]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  David Blaauw,et al.  Assessing the performance limits of parallelized near-threshold computing , 2012, DAC Design Automation Conference 2012.

[8]  David Blaauw,et al.  Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Xiang Pan,et al.  Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[10]  David Blaauw,et al.  Theoretical and practical limits of dynamic voltage scaling , 2004, Proceedings. 41st Design Automation Conference, 2004..

[11]  Dhiraj K. Pradhan,et al.  ULS: A dual-Vth/high-κ nano-CMOS universal level shifter for system-level power management , 2010, JETC.

[12]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[13]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[14]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[15]  Uri C. Weiser,et al.  Interconnect-power dissipation in a microprocessor , 2004, SLIP '04.

[16]  David Blaauw,et al.  Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.

[17]  Hao Wang,et al.  Improving platform energy-chip area trade-off in near-threshold computing environment , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).