LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Deep Convolutional Neural Networks (CNN) are the state-of-the-art performers for the object detection task. It is well known that object detection requires more com- putation and memory than image classification. In this work, we propose LCDet, a fully-convolutional neural net- work for generic object detection that aims to work in em- bedded systems. We design and develop an end-to-end TensorFlow(TF)-based model. The detection works by a single forward pass through the network. Additionally, we employ 8-bit quantization on the learned weights. As a use case, we choose face detection and train the proposed model on images containing a varying number of faces of different sizes. We evaluate the face detection perfor- mance on publicly available dataset FDDB and Widerface. Our experimental results show that the proposed method achieves comparative accuracy comparing with state-of- the-art CNN-based face detection methods while reducing the model size by 3× and memory-BW by 3 - 4× compar- ing with one of the best real-time CNN-based object de- tector YOLO [23]. Our 8-bit fixed-point TF-model pro- vides additional 4× memory reduction while keeping the accuracy nearly as good as the floating point model and achieves 20× performance gain compared to the floating point model. Thus the proposed model is amenable for em- bedded implementations and is generic to be extended to any number of categories of objects.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[3]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[4]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yu Wang,et al.  Towards Real-Time Object Detection on Embedded Systems , 2018, IEEE Transactions on Emerging Topics in Computing.

[7]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[8]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[10]  Forrest N. Iandola,et al.  Shallow Networks for High-accuracy Road Object-detection , 2016, VEHITS.

[11]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[16]  Yizhou Wang,et al.  Face Detection with End-to-End Integration of a ConvNet and a 3D Model , 2016, ECCV.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Tao Zhang,et al.  Bootstrapping Face Detection with Hard Negative Examples , 2016, ArXiv.

[19]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[20]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[22]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[23]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Steven C. H. Hoi,et al.  Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[25]  Bin Yang,et al.  CRAFT Objects from Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[31]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[32]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Philipp Gysel,et al.  Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.