A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

This paper describes a hardware-oriented two-stage algorithm that can be deployed in a resource-limited field-programmable gate array (FPGA) for fast-object detection and recognition with out external memory. The first stage is the bounding boxes proposal with a conventional object detection method, and the second is convolutional neural network (CNN)-based classification for accuracy improvement. Frequently accessing external memories significantly affects the execution efficiency of object classification. Unfortunately, the existing CNN models with a large number of parameters are difficult to deploy in FPGAs with limited on-chip memory resources. In this study, we designed a compact CNN model and performed the hardware-oriented quantization for parameters and intermediate results. As a result, CNN-based ultra-fast-object classification was realized with all parameters and intermediate results stored on chip. Several evaluations were performed to demonstrate the performance of the proposed algorithm. The object classification module consumes only 163.67 Kbits of on-chip memories for ten regions of interest (ROIs), this is suitable for low-end FPGA devices. In the aspect of accuracy, our method provides a correctness rate of 98.01% in open-source data set MNIST and over 96.5% in other three self-built data sets, which is distinctly better than conventional ultra-high-speed object detection algorithms.

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[3]  Jun Chen,et al.  Super high-speed vision platform for processing 1024×1024 images in real time at 12500 fps , 2016, 2016 IEEE/SICE International Symposium on System Integration (SII).

[4]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[5]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[7]  Eunhyeok Park,et al.  Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yang Liu,et al.  Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[10]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Takeshi Takaki,et al.  500-fps face tracking system , 2012, Journal of Real-Time Image Processing.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yifan Sun,et al.  Wide Compression: Tensor Ring Nets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Tadayoshi Aoyama,et al.  LOC-Based High-Throughput Cell Morphology Analysis System , 2015, IEEE Transactions on Automation Science and Engineering.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Idaku Ishii,et al.  Review of some advances and applications in real-time high-speed vision: Our views and experiences , 2016, Int. J. Autom. Comput..

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Amit K. Roy-Chowdhury,et al.  Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[26]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Takeshi Takaki,et al.  Color-histogram-based tracking at 2000 fps , 2012, J. Electronic Imaging.

[29]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Takeshi Takaki,et al.  Fast FPGA-Based Multiobject Feature Extraction , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Tadayoshi Aoyama,et al.  Simultaneous Vision-Based Shape and Motion Analysis of Cells Fast-Flowing in a Microchannel , 2015, IEEE Transactions on Automation Science and Engineering.

[34]  De Xu,et al.  12,000-fps Multi-object detection using HOG descriptor and SVM classifier , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[37]  Hiroki Nakahara,et al.  A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA , 2018, FPGA.