Real-Time Multi-Scale Face Detector on Embedded Devices

Face detection is the basic step in video face analysis and has been studied for many years. However, achieving real-time performance on computation-resource-limited embedded devices still remains an open challenge. To address this problem, in this paper we propose a face detector, EagleEye, which shows a good trade-off between high accuracy and fast speed on the popular embedded device with low computation power (e.g., the Raspberry Pi 3b+). The EagleEye is designed to have low floating-point operations per second (FLOPS) as well as enough capacity, and its accuracy is further improved without adding too much FLOPS. Specifically, we design five strategies for building efficient face detectors with a good balance of accuracy and running speed. The first two strategies help to build a detector with low computation complexity and enough capacity. We use convolution factorization to change traditional convolutions into more sparse depth-wise convolutions to save computation costs and we use successive downsampling convolutions at the beginning of the face detection network. The latter three strategies significantly improve the accuracy of the light-weight detector without adding too much computation costs. We design an efficient context module to utilize context information to benefit the face detection. We also adopt information preserving activation function to increase the network capacity. Finally, we use focal loss to further improve the accuracy by handling the class imbalance problem better. Experiments show that the EagleEye outperforms the other face detectors with the same order of computation costs, on both runtime efficiency and accuracy.

[1]  Ming Tang,et al.  Improved Single Shot Object Detector Using Enhanced Features and Predicting Heads , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[2]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[3]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Hao Wang,et al.  Detecting Faces Using Region-based Fully Convolutional Networks , 2017 .

[6]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[7]  Haifeng Shen,et al.  Learning Better Features for Face Detection with Feature Fusion and Segmentation Supervision , 2018, ArXiv.

[8]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[9]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rama Chellappa,et al.  A Real-Time Multi-Task Single Shot Face Detector , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[11]  Yizhou Wang,et al.  Face Detection with End-to-End Integration of a ConvNet and a 3D Model , 2016, ECCV.

[12]  Gianluca Francini,et al.  Learning Sparse Neural Networks via Sensitivity-Driven Regularization , 2018, NeurIPS.

[13]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[14]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[16]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[17]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[19]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Gang Yu,et al.  Face Attention Network: An Effective Face Detector for the Occluded Faces , 2017, ArXiv.

[21]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[23]  Shifeng Zhang,et al.  Selective Refinement Network for High Performance Face Detection , 2018, AAAI.

[24]  Anastasios Tefas,et al.  A Fast Deep Convolutional Neural Network for Face Detection in Big Visual Data , 2016, INNS Conference on Big Data.

[25]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[28]  Steven C. H. Hoi,et al.  Feature Agglomeration Networks for Single Stage Face Detection , 2017, Neurocomputing.

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[31]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[32]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[33]  Charless C. Fowlkes,et al.  Occlusion Coherence: Detecting and Localizing Occluded Faces , 2015, ArXiv.

[34]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[35]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ran Tao,et al.  Seeing Small Faces from Robust Anchor's Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[38]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[39]  Shengcai Liao,et al.  A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Mohan M. Trivedi,et al.  To boost or not to boost? On the limits of boosted trees for object detection , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[42]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[45]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Gang Hua,et al.  Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[50]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  C. V. Jawahar,et al.  Visual Phrases for Exemplar Face Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[55]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[57]  Gang Hua,et al.  Efficient Boosted Exemplar-Based Face Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Shifeng Zhang,et al.  FaceBoxes: A CPU real-time face detector with high accuracy , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[60]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[61]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[62]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[63]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  Xiang Xu,et al.  Robust and High Performance Face Detector , 2019, ArXiv.

[65]  Shifeng Zhang,et al.  Single Shot Attention-Based Face Detector , 2018, CCBR.

[66]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).