Efficient Yolo: A Lightweight Model For Embedded Deep Learning Object Detection

It is essential to pursue efficiency for on-road object detection task. To incorporate deep model into embedded devices while maintaining high accuracy, in this paper, an Efficient YOLO framework is rebuilt based on traditional YOLOv3. Firstly, an iterative initialization strategy is designed to ensure the network sparsity in the initial training. Then comprehensive pruning schemes including layer-level and channel-wise pruning are proposed to lighten the model parameters.With the support of external dataset, the detection accuracy remains at a high level. Compared with the orignal version, our model shrinks the model size by 96.93% and calculation amount by 84.36%. The inference speed is improved 2.23 times on NVIDIA Jetson TX2 platform. Finally, we achieve a mAP of 0.492 on the testing dataset, and rank the top accuracy of ICME competition.

[1]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[6]  Hui Xiong,et al.  A new benchmark for vision-based cyclist detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[7]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[11]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[12]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Pengyi Zhang,et al.  SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).