An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs

Field Programmable Gate Array (FPGA) has become an efficient accelerator for convolutional neural network (CNN) inference due to its high performance and flexibility. To further improve the performance of CNN inference on FPGAs, an Intellectual Property core (IP core) called Deep Learning Processor Unit (DPU) is released by Xilinx. Unlike previous FPGA-based hardware designs focusing on specific functions or CNNs, the DPU IP supports ample basic functions of deep learning, and the developers can take advantage of DPUs to accelerate CNN inference conveniently. In DPU-based CNN acceleration platform, an encapsulated scheduler plays a crucial role in task scheduling between heterogeneous ARM and multiple DPUs. However, the current scheduler is unsatisfactory because its low schedule efficiency. This paper thus presents a high performance task assignment framework built upon Xilinx hybrid CPU-FPGA MPSoC devices. We first evaluate the main causes of low schedule efficiency problem. Then, we explore the scheduler rules and improve shedule efficiency through purposeful observations and analysis. Finally, we integrate our optimizations, and propose an efficient task assignment framework to maximize performance on DPU-based CNN acceleration platform. Experimental results on Xilinx Zynq UltraScale+ MPSoC zcu104 show that our efficient task assignment framework significantly boosts schedule efficiency for small-scale CNNs (from 36% to 70%), medium-scale CNNs (from 65% to 95%), and large-scale CNNs (from 77% to 99%) compared with original schedule strategy.

[1]  Thang Viet Huynh Deep neural network accelerator based on FPGA , 2017, 2017 4th NAFOSTED Conference on Information and Computer Science.

[2]  Hiroki Nakahara,et al.  Real-Time Multi-Pedestrian Detection in Surveillance Camera using FPGA , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Leibo Liu,et al.  Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[4]  Deming Chen,et al.  Deep Neural Network Model and FPGA Accelerator Co-Design: Opportunities and Challenges , 2018, 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT).

[5]  Xuelei Li,et al.  FPGA accelerates deep residual learning for image recognition , 2017, 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).

[6]  Chun Zhang,et al.  Optimization for Efficient Hardware Implementation of CNN on FPGA , 2018, 2018 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA).

[7]  Hao Liang,et al.  The Design of Lightweight and Multi Parallel CNN Accelerator Based on FPGA , 2019, 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC).

[8]  Muhammad Shafique,et al.  FPGA-Based Convolutional Neural Network Architecture with Reduced Parameter Requirements , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[9]  Guoqiang Bai,et al.  A FPGA-based Accelerator of Convolutional Neural Network for Face Feature Extraction , 2019, 2019 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).

[10]  Yi Shan ADAS and Video Surveillance Analytics System Using Deep Learning Algorithms on FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[11]  Yun Liang,et al.  SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12]  Jason Cong,et al.  Frequency Improvement of Systolic Array-Based CNNs on FPGAs , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[13]  Xin Li,et al.  A high utilization FPGA-based accelerator for variable-scale convolutional neural network , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[14]  Qiming Sun,et al.  A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA , 2019, 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE).

[15]  Lu Tian,et al.  A High-Performance CNN Processor Based on FPGA for MobileNets , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Lu Tian,et al.  Real-Time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[18]  Chao Wang,et al.  A Deep Learning Prediction Process Accelerator Based FPGA , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[19]  Feng Wu,et al.  Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks , 2016, 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT).

[20]  Hiroki Nakahara,et al.  An FPGA Implementation of Real-Time Object Detection with a Thermal Camera , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[21]  Ahmad Shawahna,et al.  FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.

[22]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[23]  John D. Owens,et al.  Benchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[24]  Marco D. Santambrogio,et al.  On How to Efficiently Implement Deep Learning Algorithms on PYNQ Platform , 2018, 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[25]  Elliott Delaye,et al.  Deep learning challenges and solutions with Xilinx FPGAs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[26]  Chao Huang,et al.  A layer-based structured design of CNN on FPGA , 2017, 2017 IEEE 12th International Conference on ASIC (ASICON).

[27]  Nachiket Kapre,et al.  Evaluating Embedded FPGA Accelerators for Deep Learning Applications , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[28]  Xing Fang,et al.  A Deep Residual Networks Accelerator on FPGA , 2019, 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI).

[29]  Bin Yu,et al.  Architecture Design of Convolutional Neural Networks for Face Detection on an FPGA Platform , 2018, 2018 IEEE International Workshop on Signal Processing Systems (SiPS).

[30]  Liqiang Lu,et al.  An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[31]  Alessandro Aimar,et al.  NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Yun Liang,et al.  Fune: An FPGA Tuning Framework for CNN Acceleration , 2020, IEEE Design & Test.