Real-Time SSDLite Object Detection on FPGA

Deep neural network (DNN)-based object detection has been investigated and applied to various real-time applications. However, it is hard to employ the DNNs in embedded systems due to their high computational complexity and deep-layered structure. Although several field-programmable gate array (FPGA) implementations have been presented recently for real-time object detection, they suffer from either low throughput or low detection accuracy. In this article, we propose an efficient computing system for real-time SSDLite object detection on FPGA devices, which includes novel hardware architecture and system optimization techniques. In the proposed hardware architecture, a neural processing unit (NPU) that consists of heterogeneous units, such as band processing, scaling, and accumulating, and data fetching and formatting units is designed to accelerate the DNNs efficiently. In addition, system optimization techniques are presented to improve the throughput further. A task control unit is employed to balance the workload and increase the utilization of heterogeneous units in the NPU, and the object detection algorithm is refined accordingly. The proposed architecture is realized on an Intel Arria 10 FPGA and enhances the throughput by up to $13.6\times $ compared to the state-of-the-art FPGA implementation.

[1]  Li Wan,et al.  End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shengen Yan,et al.  Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[8]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[9]  Hyuk-Jae Lee,et al.  A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Lu Tian,et al.  A High-Performance CNN Processor Based on FPGA for MobileNets , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[15]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[16]  Thomas B. Preußer,et al.  Inference of quantized neural networks on heterogeneous all-programmable devices , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[19]  Wayne Luk,et al.  A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[20]  Byeong Yong Kong,et al.  Retrain-Less Weight Quantization for Multiplier-Less Convolutional Neural Networks , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[21]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[23]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[28]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[29]  Lu Tian,et al.  Real-Time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Jihyuck Jo,et al.  DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[32]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[33]  Hiroki Nakahara,et al.  A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA , 2018, FPGA.

[34]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[35]  Yu Cao,et al.  Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[36]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[37]  Hyuk-Jae Lee,et al.  Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[40]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[41]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[44]  Vaughn Betz,et al.  Comparing FPGA vs. custom cmos and the impact on processor microarchitecture , 2011, FPGA '11.

[45]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[46]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[47]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[49]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[50]  Ahmad Shawahna,et al.  FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.