Comprehensive Evaluation of Supply Voltage Underscaling in FPGA on-Chip Memories

In this work, we evaluate aggressive undervolting, i.e., voltage scaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we measure this voltage guardband to be on average 39% of the nominal level, which in turn, delivers more than an order of magnitude power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, i.e., start to appear faults. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, with a concentration of on-chip memories, or Block RAMs (BRAMs). Finally, we evaluate a typical FPGA-based Neural Network (NN) accelerator under low-voltage BRAM operations. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss, we propose a novel technique that relies on the deterministic behavior of undervolting faults and can limit the accuracy loss to 0.1% without any timing-slack overhead.

[1]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Jose-Maria Arnau,et al.  An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Peter Y. K. Cheung,et al.  Timing Fault Detection in FPGA-Based Circuits , 2014, FCCM 2014.

[4]  Hanqing Lu,et al.  Recent advances in efficient computation of deep convolutional neural networks , 2018, Frontiers of Information Technology & Electronic Engineering.

[5]  Shuaiwen Song,et al.  Combating the reliability challenge of GPU register file at low supply voltage , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[6]  Valerio Schiavoni,et al.  LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing , 2018, CF.

[7]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[8]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  José Núñez-Yáñez,et al.  Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors , 2017, Microprocess. Microsystems.

[10]  Osman S. Unsal,et al.  Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[11]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[12]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[13]  Pradip Bose,et al.  Safe limits on voltage reduction efficiency in GPUs: A direct measurement approach , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Behzad Salami,et al.  AxleDB: A novel programmable query processing platform on FPGA , 2017, Microprocess. Microsystems.

[16]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[17]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[18]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[19]  Boris Murmann,et al.  SRAM voltage scaling for energy-efficient convolutional neural networks , 2017, 2017 18th International Symposium on Quality Electronic Design (ISQED).

[20]  Jeff Zhang,et al.  ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators , 2018, ArXiv.

[21]  Christos-Savvas Bouganis,et al.  Latency-driven design for FPGA-based convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[22]  Boris Murmann,et al.  Approximate SRAM for Energy-Efficient, Privacy-Preserving Convolutional Neural Networks , 2017, 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[23]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[24]  Shidhartha Das,et al.  Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Denis J. Dean,et al.  Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[26]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[27]  Valerio Schiavoni,et al.  LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing , 2018, SAMOS.

[28]  Kartheek Rangineni,et al.  ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[29]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[30]  Behzad Salami,et al.  Hardware Acceleration for Query Processing: Leveraging FPGAs, CPUs, and Memory , 2016, Computing in Science & Engineering.

[31]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[32]  Bernard Girau,et al.  Fault and Error Tolerance in Neural Networks: A Review , 2017, IEEE Access.

[33]  Osman S. Unsal,et al.  FaulTM: Error detection and recovery using Hardware Transactional Memory , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  S. M. Faisal,et al.  b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).

[36]  Dhananjay S. Phatak,et al.  Complete and partial fault tolerance of feedforward neural nets , 1995, IEEE Trans. Neural Networks.

[37]  John Sartori,et al.  Designing a processor from the ground up to allow voltage/reliability tradeoffs , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[38]  Osman S. Unsal,et al.  Exploiting a fast and simple ECC for scaling supply voltage in level-1 caches , 2014, 2014 IEEE 20th International On-Line Testing Symposium (IOLTS).

[39]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[40]  Behzad Salami,et al.  HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[41]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[43]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[44]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[45]  Michael J. Wirthlin Improving the reliability of FPGA circuits using triple-modular redundancy (TMR) & efficient voter placement , 2004, FPGA '04.

[46]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[47]  Houman Homayoun,et al.  Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[48]  Osman S. Unsal,et al.  Trigeneous Platforms for Energy Efficient Computing of HPC Applications , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[49]  Mohammad Hosseinabady,et al.  Energy Optimization in Commercial FPGAs with Voltage, Frequency and Logic Scaling , 2016, IEEE Transactions on Computers.

[50]  Marian Verhelst,et al.  An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS , 2017, IEEE Journal of Solid-State Circuits.

[51]  Hamid Tabani Low-power architectures for automatic speech recognition , 2018 .

[52]  Pradip Bose,et al.  BRAVO: Balanced Reliability-Aware Voltage Optimization , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).