AOS: An Automated Overclocking System for High-Performance CNN Accelerator Through Timing Delay Measurement on FPGA

With the inherent algorithmic error resilience of conventional neural networks (CNNs) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter and lack an automated process for testing circuit delays. In this article, we first propose 2-dimension multiframe fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDMs) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657 FPS at 436 MHz without accuracy degradation, which is $1.41\times $ performance compared to the baseline.

[1]  Heng Yu,et al.  FODM: A Framework for Accurate Online Delay Measurement Supporting All Timing Paths in FPGA , 2022, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Ricardo Reis,et al.  Applying Lightweight Soft Error Mitigation Techniques to Embedded Mixed Precision Deep Neural Networks , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Yun Liang,et al.  OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  W. Luk,et al.  High-Performance FPGA-based Accelerator for Bayesian Neural Networks , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[5]  Dimitrios Soudris,et al.  A PVT-Aware Voltage Scaling Method for Energy Efficient FPGAs , 2021, 2021 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Younghoon Byun,et al.  Layerwise Buffer Voltage Scaling for Energy-Efficient Convolutional Neural Network , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Chao Wang,et al.  WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Sehwan Lee,et al.  WinDConv: A Fused Datapath CNN Accelerator for Power-Efficient Edge Devices , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Sharad Sinha,et al.  Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling , 2020, ACM Trans. Reconfigurable Technol. Syst..

[10]  Jinjun Xiong,et al.  EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[11]  Deming Chen,et al.  HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[12]  Tomofumi Yuki,et al.  Safe Overclocking for CNN Accelerators Through Algorithm-Level Error Detection , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Jason Cong,et al.  End-to-End Optimization of Deep Learning Applications , 2020, FPGA.

[14]  Swaroop Ghosh,et al.  Sensitivity based Error Resilient Techniques for Energy Efficient Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[15]  Luigi Carro,et al.  Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs , 2019, IEEE Transactions on Reliability.

[16]  Jose Nunez-Yanez,et al.  Energy Proportional Neural Network Inference with Adaptive Voltage and Frequency Scaling , 2019, IEEE Transactions on Computers.

[17]  Hyuk-Jae Lee,et al.  A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Yu Wang,et al.  DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Yun Liang,et al.  REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs , 2019, FPGA.

[20]  Xiaoqian Zhang,et al.  Compute-Efficient Neural-Network Acceleration , 2019, FPGA.

[21]  Gu-Yeon Wei,et al.  DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications , 2018, IEEE Journal of Solid-State Circuits.

[22]  Thierry Moreau,et al.  Energy-Efficient Neural Network Acceleration in the Presence of Bit-Level Memory Errors , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[23]  Gu-Yeon Wei,et al.  Ares: A framework for quantifying the resilience of deep neural networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[24]  Muhammad Shafique,et al.  Error resilience analysis for systematically employing approximate computing in convolutional neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Hiroki Nakahara,et al.  A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA , 2018, FPGA.

[26]  Kartheek Rangineni,et al.  ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[27]  Vaughn Betz,et al.  Automatic Application-Specific Calibration to Enable Dynamic Voltage Scaling in FPGAs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[29]  Marian Verhelst,et al.  DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[30]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[31]  Peter Y. K. Cheung,et al.  Dynamic voltage & frequency scaling with online slack measurement , 2014, FPGA.

[32]  Peter Y. K. Cheung,et al.  Online Measurement of Timing in Circuits: For Health Monitoring and Dynamic Voltage & Frequency Scaling , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[33]  Abdulazim Amouri,et al.  A Low-Cost Sensor for Aging and Late Transitions Detection in Modern FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[34]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[35]  José Luis Núñez-Yáñez,et al.  Adaptive Voltage Scaling with In-Situ Detectors in Commercial FPGAs , 2015, IEEE Transactions on Computers.