Aggressive undervolting of FPGAs : power & reliability trade-offs

In this work, we evaluate aggressive undervolting, i.e., voltage underscaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we con?rm a large voltage guardband for several FPGA components, which in turn, delivers signi?cant power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, and faults might start to appear. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, primarily focusing on FPGA on-chip memories, or Block RAMs (BRAMs). Understanding this behavior can allow to deploy ef?cient mitigation techniques, and in turn, FPGA-based designs can be improved for better energy, reliability, and performance trade-offs. Finally, as a case study, we evaluate a typical FPGA-based Neural Network (NN) accelerator when the FPGA voltage is underscaled. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss below the voltage guardband gap, we proposed an application-aware technique and we also, evaluated the built-in Error-Correcting Code (ECC) mechanism. Hence, First, we developed an application-dependent BRAMs placement technique that relies on the deterministic behavior of undervolting faults, and mitigates these faults by mapping the most reliability sensitive NN parameters to BRAM blocks that are relatively more resistant to undervolting faults. Second, as a more general technique, we applied the built-in ECC of BRAMs and observed a signi?cant fault coverage capability thanks to the behavior of undervolting faults, with a negligible power consumption overhead.

[1]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[2]  Rakesh Kumar,et al.  Rescuing Uncorrectable Fault Patterns in On-Chip Memories through Error Pattern Transformation , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Michael J. Wirthlin Improving the reliability of FPGA circuits using triple-modular redundancy (TMR) & efficient voter placement , 2004, FPGA '04.

[4]  Fan Zhang,et al.  Power Analysis and Optimization , 2017 .

[5]  Onur Mutlu,et al.  SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6]  Anuj Pathania,et al.  Integrated CPU-GPU power management for 3D mobile games , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Amin Ansari,et al.  Enabling ultra low voltage system operation by tolerating on-chip cache failures , 2009, ISLPED.

[8]  Yu Wang,et al.  A Survey of FPGA-Based Neural Network Accelerator , 2017, 1712.08934.

[9]  Norbert Wehn,et al.  Exploiting expendable process-margins in DRAMs for run-time performance optimization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Massimo Alioto,et al.  Design-Space Exploration of Pareto-Optimal Architectures for Deep Learning with DVFS , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[11]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[12]  Doe Hyun Yoon,et al.  Memory mapped ECC: low-cost error protection for last level caches , 2009, ISCA '09.

[13]  Osman S. Unsal,et al.  On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[14]  James Dinan,et al.  Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Onur Mutlu,et al.  Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[16]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[17]  Qiang Wang,et al.  HKBU Institutional Repository , 2018 .

[18]  José Núñez-Yáñez,et al.  Adaptive voltage scaling in a heterogeneous FPGA device with memory and logic in-situ detectors , 2017, Microprocess. Microsystems.

[19]  Amin Ansari,et al.  Tangle: Route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[20]  Sang Lyul Min,et al.  Energy-centric DVFS controlling method for multi-core platforms , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[21]  Osman S. Unsal,et al.  Exploring Energy Reduction in Future Technology Nodes via Voltage Scaling with Application to 10nm , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[22]  Jason Helge Anderson,et al.  Low-power programmable routing circuitry for FPGAs , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[23]  Jeff Zhang,et al.  Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator , 2018, 2018 IEEE 36th VLSI Test Symposium (VTS).

[24]  Eric Cheng,et al.  Very Low Voltage (VLV) Design , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[25]  Hamid Sarbazi-Azad,et al.  An efficient DVS scheme for on-chip networks using reconfigurable Virtual Channel allocators , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[26]  Nikil D. Dutt,et al.  Co-Cap: energy-efficient cooperative CPU-GPU frequency capping for mobile games , 2016, SAC.

[27]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[28]  Radu Teodorescu,et al.  Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Radu Teodorescu,et al.  Flexible Error Protection for Energy Efficient Reliable Architectures , 2010, 2010 22nd International Symposium on Computer Architecture and High Performance Computing.

[30]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[31]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[32]  Fei Li,et al.  Routing track duplication with fine-grained power-gating for FPGA interconnect power reduction , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[33]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[34]  Wayne Luk,et al.  An Overview of Low-Power Techniques for Field-Programmable Gate Arrays , 2008, 2008 NASA/ESA Conference on Adaptive Hardware and Systems.

[35]  Raffaele Tripiccione,et al.  Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors , 2018 .

[36]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[37]  Pradip Bose,et al.  BRAVO: Balanced Reliability-Aware Voltage Optimization , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[38]  Thierry Moreau,et al.  MATIC: Learning around errors for efficient low-voltage neural network accelerators , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[39]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Massimo Violante,et al.  A new reliability-oriented place and route algorithm for SRAM-based FPGAs , 2006, IEEE Transactions on Computers.

[41]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[42]  Qiang Wu,et al.  Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[43]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[44]  Boris Murmann,et al.  SRAM voltage scaling for energy-efficient convolutional neural networks , 2017, 2017 18th International Symposium on Quality Electronic Design (ISQED).

[45]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[46]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[47]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[48]  Hai Liu,et al.  Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems , 2017, e-Energy.

[49]  Sukhan Lee,et al.  CiDRA: A cache-inspired DRAM resilience architecture , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[50]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[51]  Andrew B. Kahng,et al.  Power-aware placement , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[52]  Onur Mutlu,et al.  HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[53]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[54]  Jeremy Hsu,et al.  IBM's new brain [News] , 2014 .

[55]  Gabriel L. Nazar,et al.  Exploiting Modified Placement and Hardwired Resources to Provide High Reliability in FPGAs , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[56]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[57]  Onur Mutlu,et al.  Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[58]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[59]  Behzad Salami,et al.  Hardware Acceleration for Query Processing: Leveraging FPGAs, CPUs, and Memory , 2016, Computing in Science & Engineering.

[60]  Li Zhou,et al.  Core tunneling: Variation-aware voltage noise mitigation in GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[61]  Pradip Bose,et al.  Safe limits on voltage reduction efficiency in GPUs: A direct measurement approach , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[62]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[63]  Dhananjay S. Phatak,et al.  Complete and partial fault tolerance of feedforward neural nets , 1995, IEEE Trans. Neural Networks.

[64]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[65]  MutluOnur,et al.  Flipping bits in memory without accessing them , 2014 .

[66]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[67]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[68]  Onur Mutlu,et al.  Experimental Characterization, Optimization, and Recovery of Data Retention Errors in MLC NAND Flash Memory , 2018, ArXiv.

[69]  Onur Mutlu,et al.  Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[70]  Denis J. Dean,et al.  Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[71]  Dimitrios S. Nikolopoulos,et al.  Power modelling and capping for heterogeneous ARM/FPGA SoCs , 2014, 2014 International Conference on Field-Programmable Technology (FPT).

[72]  Norbert Wehn,et al.  A Platform to Analyze DDR3 DRAM’s Power and Retention Time , 2017, IEEE Design & Test.

[73]  Paul S. Zuchowski,et al.  A hybrid ASIC and FPGA architecture , 2002, ICCAD 2002.

[74]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[75]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[76]  Peter Y. K. Cheung,et al.  Dynamic voltage & frequency scaling with online slack measurement , 2014, FPGA.

[77]  Dhananjay S. Phatak,et al.  Investigating the Fault Tolerance of Neural Networks , 2005, Neural Computation.

[78]  Eriko Nurvitadhi,et al.  Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[79]  Behzad Salami,et al.  AxleDB: A novel programmable query processing platform on FPGA , 2017, Microprocess. Microsystems.

[80]  Puneet Gupta,et al.  Power / capacity scaling: Energy savings with simple fault-tolerant caches , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[81]  Onur Mutlu,et al.  The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[82]  Tulika Mitra,et al.  Configuration bitstream compression for dynamically reconfigurable FPGAs , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[83]  M. Pattanaik,et al.  Clock gating based energy efficient ALU design and implementation on FPGA , 2013, 2013 International Conference on Energy Efficient Technologies for Sustainability.

[84]  Houman Homayoun,et al.  Enhancing Power, Performance, and Energy Efficiency in Chip Multiprocessors Exploiting Inverse Thermal Dependence , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[85]  Nam Sung Kim,et al.  Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[86]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[87]  Josep Torrellas,et al.  ScalCore: Designing a core for voltage scalability , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[88]  Wei Liu,et al.  A Novel Fault-Tolerant Last-Level Cache to Improve Reliability at Near-Threshold Voltage , 2018, ACM Great Lakes Symposium on VLSI.

[89]  Onur Mutlu,et al.  The RowHammer problem and other issues we may face as memory becomes denser , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[90]  Radu Teodorescu,et al.  Mitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Units , 2012, IEEE Computer Architecture Letters.

[91]  Marian Verhelst,et al.  An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS , 2017, IEEE Journal of Solid-State Circuits.

[92]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[93]  Kartheek Rangineni,et al.  ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[94]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[95]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[96]  John Kalamatianos,et al.  On characterizing near-threshold SRAM failures in FinFET technology , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[97]  Behzad Salami,et al.  HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[98]  Bianca Schroeder,et al.  Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design , 2012, ASPLOS XVII.

[99]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[100]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[101]  Tulika Mitra,et al.  Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[102]  Radu Teodorescu,et al.  EmerGPU: Understanding and mitigating resonance-induced voltage noise in GPU architectures , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[103]  Onur Mutlu,et al.  Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency , 2018, ArXiv.

[104]  Peter Y. K. Cheung,et al.  Fault tolerant methods for reliability in FPGAs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[105]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[106]  Morteza Saheb Zamani,et al.  VMAP: A Variation Map-Aware Placement Algorithm for Leakage Power Reduction in FPGAs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[107]  Onur Mutlu,et al.  Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation , 2018, SIGMETRICS.

[108]  Gu-Yeon Wei,et al.  14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[109]  Michael J. Wirthlin,et al.  The reliability of FPGA circuit designs in the presence of radiation induced configuration upsets , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[110]  Osman S. Unsal,et al.  Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique , 2016, CARLA.

[111]  Kees G. W. Goossens,et al.  Improved Power Modeling of DDR SDRAMs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[112]  Shuaiwen Song,et al.  Combating the reliability challenge of GPU register file at low supply voltage , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[113]  Peter Y. K. Cheung,et al.  Timing Fault Detection in FPGA-Based Circuits , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[114]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[115]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[116]  Boris Murmann,et al.  Approximate SRAM for Energy-Efficient, Privacy-Preserving Convolutional Neural Networks , 2017, 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[117]  Nilay Khare,et al.  Analysis of DVFS Techniques for Improving the GPU Energy Efficiency , 2015 .

[118]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[119]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[120]  Jongmoo Choi,et al.  WARM: Improving NAND flash memory lifetime with write-hotness aware retention management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[121]  Xiang Pan,et al.  Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[122]  Shidhartha Das,et al.  Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[123]  Yan Liu,et al.  Distilling Knowledge from Deep Networks with Applications to Healthcare Domain , 2015, ArXiv.

[124]  Xiang Pan,et al.  VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[125]  Elad Alon,et al.  Per-Core DVFS With Switched-Capacitor Converters for Energy Efficiency in Manycore Processors , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[126]  Yan Zhang,et al.  Clock-Gating in FPGAs: A Novel and Comparative Evaluation , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[127]  Onur Mutlu,et al.  Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[128]  José Luis Núñez-Yáñez,et al.  Adaptive Voltage Scaling with In-Situ Detectors in Commercial FPGAs , 2015, IEEE Transactions on Computers.

[129]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[130]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[131]  John Sartori,et al.  Designing a processor from the ground up to allow voltage/reliability tradeoffs , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[132]  Alexander Gruenstein,et al.  Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.

[133]  Osman S. Unsal,et al.  Exploiting a fast and simple ECC for scaling supply voltage in level-1 caches , 2014, 2014 IEEE 20th International On-Line Testing Symposium (IOLTS).

[134]  Wu-chun Feng,et al.  GPU power prediction via ensemble machine learning for DVFS space exploration , 2018, CF.

[135]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[136]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[137]  Nathalie Julien,et al.  An FPGA Power Aware Design Flow , 2006, PATMOS.

[138]  Jason Cong,et al.  FPGA Design Automation: A Survey , 2006, Found. Trends Electron. Des. Autom..

[139]  Mahmut T. Kandemir,et al.  A Dual-VDD Low Power FPGA Architecture , 2004, FPL.

[140]  FPGA Architecture White Paper , 2006 .

[141]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[142]  J. Becker,et al.  Fine grain fault tolerance — A key to high reliability for FPGAs in space , 2012, 2012 IEEE Aerospace Conference.

[143]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[144]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[145]  Nicholas D. Lane,et al.  Can Deep Learning Revolutionize Mobile Sensing? , 2015, HotMobile.

[146]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[147]  Onur Mutlu,et al.  What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study , 2018, SIGMETRICS.

[148]  Bernard Girau,et al.  Fault and Error Tolerance in Neural Networks: A Review , 2017, IEEE Access.

[149]  Osman S. Unsal,et al.  FaulTM: Error detection and recovery using Hardware Transactional Memory , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[150]  S. M. Faisal,et al.  b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[151]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[152]  Michael J. Wirthlin,et al.  Estimating TMR Reliability on FPGAs Using Markov Models , 2008 .

[153]  Laurence T. Yang,et al.  Task aware hybrid DVFS for multi-core real-time systems using machine learning , 2017, Inf. Sci..

[154]  Amin Ansari,et al.  Sthira: A Formal Approach to Minimize Voltage Guardbands under Variation in Networks-on-Chip for Energy Efficiency , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[155]  Xin Zhou,et al.  Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data , 2016 .

[156]  Julien Lamoureux,et al.  On the Interaction Between Power-Aware FPGA CAD Algorithms , 2003, ICCAD 2003.

[157]  Steven Trimberger,et al.  A 90-nm Low-Power FPGA for Battery-Powered Applications , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[158]  Ki-Seok Chung,et al.  Quality of Service-Aware Dynamic Voltage and Frequency Scaling for Embedded GPUs , 2015, IEEE Computer Architecture Letters.

[159]  Mohammad Hosseinabady,et al.  Energy Optimization in Commercial FPGAs with Voltage, Frequency and Logic Scaling , 2016, IEEE Transactions on Computers.

[160]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.