论文信息 - LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Deep Convolutional Neural Networks (CNN) are the state-of-the-art performers for the object detection task. It is well known that object detection requires more com- putation and memory than image classification. In this work, we propose LCDet, a fully-convolutional neural net- work for generic object detection that aims to work in em- bedded systems. We design and develop an end-to-end TensorFlow(TF)-based model. The detection works by a single forward pass through the network. Additionally, we employ 8-bit quantization on the learned weights. As a use case, we choose face detection and train the proposed model on images containing a varying number of faces of different sizes. We evaluate the face detection perfor- mance on publicly available dataset FDDB and Widerface. Our experimental results show that the proposed method achieves comparative accuracy comparing with state-of- the-art CNN-based face detection methods while reducing the model size by 3× and memory-BW by 3 - 4× compar- ing with one of the best real-time CNN-based object de- tector YOLO [23]. Our 8-bit fixed-point TF-model pro- vides additional 4× memory reduction while keeping the accuracy nearly as good as the floating point model and achieves 20× performance gain compared to the floating point model. Thus the proposed model is amenable for em- bedded implementations and is generic to be extended to any number of categories of objects.

[1] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[3] Huaizu Jiang,et al. Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[4] Xiaolin Hu,et al. Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yu Wang,et al. Towards Real-Time Object Detection on Embedded Systems , 2018, IEEE Transactions on Emerging Topics in Computing.

[7] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[8] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Erik Learned-Miller,et al. FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[10] Forrest N. Iandola,et al. Shallow Networks for High-accuracy Road Object-detection , 2016, VEHITS.

[11] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[13] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .

[14] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[16] Yizhou Wang,et al. Face Detection with End-to-End Integration of a ConvNet and a 3D Model , 2016, ECCV.

[17] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Tao Zhang,et al. Bootstrapping Face Detection with Hard Negative Examples , 2016, ArXiv.

[19] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[20] Shuo Yang,et al. WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yoshua Bengio,et al. Low precision arithmetic for deep learning , 2014, ICLR.

[22] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[23] Forrest N. Iandola,et al. SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] Steven C. H. Hoi,et al. Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[25] Bin Yang,et al. CRAFT Objects from Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28] Shuo Yang,et al. From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30] Dumitru Erhan,et al. Scalable, High-Quality Object Detection , 2014, ArXiv.

[31] Yuning Jiang,et al. UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[32] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Philipp Gysel,et al. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.