S^3FD: Single Shot Scale-Invariant Face Detector

This paper presents a real-time face detector, named Single Shot Scale-invariant Face Detector (S3FD), which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. Specifically, we try to solve the common problem that anchorbased detectors deteriorate dramatically as the objects become smaller. We make contributions in the following three aspects: 1) proposing a scale-equitable face detection framework to handle different scales of faces well. We tile anchors on a wide range of layers to ensure that all scales of faces have enough features for detection. Besides, we design anchor scales based on the effective receptive field and a proposed equal proportion interval principle; 2) improving the recall rate of small faces by a scale compensation anchor matching strategy; 3) reducing the false positive rate of small faces via a max-out background label. As a consequence, our method achieves state-of-theart detection performance on all the common face detection benchmarks, including the AFW, PASCAL face, FDDB and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for VGA-resolution images.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  C. V. Jawahar,et al.  Visual Phrases for Exemplar Face Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  James M. Rehg,et al.  On the Design of Cascades of Boosted Ensembles for Face Detection , 2008, International Journal of Computer Vision.

[10]  Tat-Jen Cham,et al.  Fast training and selection of Haar features using statistics in boosting-based face detection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[15]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Charless C. Fowlkes,et al.  Occlusion Coherence: Detecting and Localizing Occluded Faces , 2015, ArXiv.

[19]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tao Zhang,et al.  Bootstrapping Face Detection with Hard Negative Examples , 2016, ArXiv.

[21]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[22]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[23]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[28]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[29]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[32]  Shengcai Liao,et al.  A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Marios Savvides,et al.  CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection , 2016, ArXiv.

[34]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[35]  Adrian Barbu,et al.  Face Detection with a 3D Model , 2016 .

[36]  Mohan M. Trivedi,et al.  To boost or not to boost? On the limits of boosted trees for object detection , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[37]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[40]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[41]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[42]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[43]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Steven C. H. Hoi,et al.  Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[45]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[47]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Andrew G. Howard,et al.  Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[49]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[50]  Yizhou Wang,et al.  Face Detection with End-to-End Integration of a ConvNet and a 3D Model , 2016, ECCV.

[51]  Gang Hua,et al.  Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[54]  Anastasios Tefas,et al.  A Fast Deep Convolutional Neural Network for Face Detection in Big Visual Data , 2016, INNS Conference on Big Data.

[55]  Ying Wu,et al.  Detecting and Aligning Faces by Image Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[57]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[58]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[59]  Gang Hua,et al.  Efficient Boosted Exemplar-Based Face Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Bin Yang,et al.  Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[62]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[63]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.