Assessing the Effects of Low Voltage in Branch Prediction Units

Branch prediction units are key performance components in modern microprocessors as they are widely used to address control hazards and minimize misprediction stalls. The continuous urge of high performance has led designers to integrate highly sophisticated predictors with complex prediction algorithms and large storage requirements. As a result, BPUs in modern microprocessors consume large amounts of power. But when a system is under a limited power budget, critical decisions are required in order to achieve an equilibrium point between the BPU and the rest of the microprocessor. In this work, we present a comprehensive analysis of the effects of low voltage configuration Branch Prediction Units (BPU). We propose a design with separate voltage domain for the BPU, which exploits the speculative nature of the BPU (which is self-correcting) that allows reduction of power without affecting functional correctness. Our study explores how several branch predictor implementations behave when aggressively undervolted, the performance impact of BTB as well as in which cases it is more efficient to reduce the BP and BTB size instead of undervolting. We also show that protection of BPU SRAM arrays has limited potential to further increase the energy savings, showcasing a realistic protection implementation. Our results show that BPU undervolting can result in power savings up to 69%, while the microprocessor energy savings can be up to 12%, before the penalty of the performance degradation overcomes the benefits of low voltage. Neither smaller predictor sizes nor protection mechanisms can further improve energy consumption.

[1]  Dimitris Gizopoulos,et al.  Analysis and Characterization of Ultra Low Power Branch Predictors , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[2]  Melvin A. Breuer,et al.  Tolerance of performance degrading faults for effective yield improvement , 2009, 2009 International Test Conference.

[3]  Kevin Skadron,et al.  Power issues related to branch prediction , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[4]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Andreas Moshovos,et al.  SEPAS: A highly accurate energy-efficient branch predictor , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[6]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[7]  Michael C. Huang,et al.  Branch prediction on demand: an energy-efficient solution , 2003, ISLPED '03.

[8]  Radu Teodorescu,et al.  Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Dimitris Gizopoulos,et al.  Faults in data prefetchers: Performance degradation and variability , 2016, 2016 IEEE 34th VLSI Test Symposium (VTS).

[10]  Dimitris Gizopoulos,et al.  Anatomy of microarchitecture-level reliability assessment: Throughput and accuracy , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Sunwook Kim,et al.  Low Power Branch Predictor for Embedded Processors , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[12]  Alex Orailoglu,et al.  Power efficient branch prediction through early identification of branch addresses , 2006, CASES '06.

[13]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[14]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[15]  Shidhartha Das,et al.  Measuring and Exploiting Guardbands of Server-Grade ARMv8 CPU Cores and DRAMs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[16]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[17]  Avi Mendelson,et al.  Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems , 2016, ACM Trans. Archit. Code Optim..

[18]  Dimitris Gizopoulos,et al.  Statistical Analysis of Multicore CPUs Operation in Scaled Voltage Conditions , 2018, IEEE Computer Architecture Letters.

[19]  Borivoje Nikolic,et al.  SRAM Assist Techniques for Operation in a Wide Voltage Range in 28-nm CMOS , 2012, IEEE Transactions on Circuits and Systems II: Express Briefs.

[20]  Dimitris Gizopoulos,et al.  Voltage margins identification on commercial x86-64 multicore microprocessors , 2017, 2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS).

[21]  Kiamal Z. Pekmestzi,et al.  Efficient Memory Repair Using Cache-Based Redundancy , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Dimitris Gizopoulos,et al.  Differential Fault Injection on Microarchitectural Simulators , 2015, 2015 IEEE International Symposium on Workload Characterization.

[23]  Yen-Jen Chang,et al.  Lazy BTB: reduce BTB energy consumption using dynamic profiling , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[24]  Rakesh Kumar,et al.  Rescuing Uncorrectable Fault Patterns in On-Chip Memories through Error Pattern Transformation , 2016, ISCA.

[25]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[27]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[28]  John Kalamatianos,et al.  On characterizing near-threshold SRAM failures in FinFET technology , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Dimitris Gizopoulos,et al.  Micro-Viruses for Fast System-Level Voltage Margins Characterization in Multicore CPUs , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[30]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[31]  Georgios Keramidas,et al.  Recovery of performance degradation in defective branch target buffers , 2016, 2016 IEEE 22nd International Symposium on On-Line Testing and Robust System Design (IOLTS).

[32]  Shidhartha Das,et al.  Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Dimitris Gizopoulos,et al.  Assessing the impact of hard faults in performance components of modern microprocessors , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[34]  Michael C. Huang,et al.  Customizing the Branch Predictor to Reduce Complexity and Energy Consumption , 2003, IEEE Micro.

[35]  Kevin Skadron,et al.  A Break-Even Formulation for Evaluating Branch Predictor Energy Efcienc y , 2005 .

[36]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Antonio María González Colás,et al.  Low Vccmin fault-tolerant cache with highly predictable performance , 2009, MICRO 2009.