论文信息 - DycSe: A Low-Power, Dynamic Reconfiguration Column Streaming-Based Convolution Engine for Resource-Aware Edge AI Accelerators

DycSe: A Low-Power, Dynamic Reconfiguration Column Streaming-Based Convolution Engine for Resource-Aware Edge AI Accelerators

Edge AI accelerators are utilized to accelerate the computation in edge AI devices such as image recognition sensors on robotics, door lockers, drones, and remote sensing satellites. Instead of using a general-purpose processor (GPP) or graphic processing unit (GPU), an edge AI accelerator brings a customized design to meet the requirements of the edge environment. The requirements include real-time processing, low-power consumption, and resource-awareness, including resources on field programmable gate array (FPGA) or limited application-specific integrated circuit (ASIC) area. The system’s reliability (e.g., permanent fault tolerance) is essential if the devices target radiation fields such as space and nuclear power stations. This paper proposes a dynamic reconfigurable column streaming-based convolution engine (DycSe) with programmable adder modules for low-power and resource-aware edge AI accelerators to meet the requirements. The proposed DycSe design does not target the FPGA platform only. Instead, it is an intellectual property (IP) core design. The FPGA platform used in this paper is for prototyping the design evaluation. This paper uses the Vivado synthesis tool to evaluate the power consumption and resource usage of DycSe. Since the synthesis tool is limited to giving the final complete system result in the designing stage, we compare DycSe to a commercial edge AI accelerator for cross-reference with other state-of-the-art works. The commercial architecture shares the competitive performance within the low-power ultra-small (LPUS) edge AI scopes. The result shows that DycSe contains 3.56% less power consumption and slight resources (1%) overhead with reconfigurable flexibility.

T. Arslan | W. Lin | Yajun Zhu

[1] Yulhwa Kim,et al. BitBlade: Energy-Efficient Variable Bit-Precision Hardware Accelerator for Quantized Neural Networks , 2022, IEEE Journal of Solid-State Circuits.

[2] Lirong Zheng,et al. IECA: An In-Execution Configuration CNN Accelerator With 30.55 GOPS/mm² Area Efficiency , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3] J. M. Pierre Langlois,et al. CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4] Shen-Fu Hsiao,et al. Design of a Sparsity-Aware Reconfigurable Deep Learning Accelerator Supporting Various Types of Operations , 2020, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[5] Arash Ardakani,et al. Fast and Efficient Convolutional Accelerator for Edge Computing , 2020, IEEE Transactions on Computers.

[6] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[7] Yen-Cheng Kuan,et al. A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9] Tulika Mitra,et al. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10] S. K. Nandy,et al. REFRESH: REDEFINE for Face Recognition Using SURE Homogeneous Cores , 2016, IEEE Transactions on Parallel and Distributed Systems.

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.