SSFD+: A Robust Two-Stage Face Detector

Face detectors based on deep learning have demonstrated great progress in detecting multi-scale faces by using multi-scale feature maps and input pyramids. However, using input pyramids and multi-scale feature maps increases the training difficulty and complexity of the network. In this paper, we focus on achieving comparable performance and simplifying the network architecture for detecting multi-scale faces. To enable network learning of multi-scale facial features from a single-scale feature map and a single-scale input image: 1) we conducted a comparative study to investigate which layer contributes more to detecting multi-scale faces and 2) we designed and implemented a simple network structure to improve the performance of detecting multi-scale faces by incorporating additional contextual information. SSFD+ achieves mAPs of (91.3%, 90.3%, 83.1%) and (92.4%, 90.9%, 83.7%) on the (easy, medium, and hard) subsets of the WIDER FACE validation and testing datasets, respectively, and promising results on the FDDB, PASCAL Faces, and AFW datasets.

[1]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[2]  Shuo Yang,et al.  Face Detection through Scale-Friendly Deep Convolutional Networks , 2017, ArXiv.

[3]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[6]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[8]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ioannis A. Kakadiaris,et al.  End-to-End 3D Face Reconstruction with Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[11]  Ioannis A. Kakadiaris,et al.  Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[12]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ioannis A. Kakadiaris,et al.  Evaluation of a 3D-aided pose invariant 2D face recognition system , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ran Tao,et al.  Seeing Small Faces from Robust Anchor's Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Larry S. Davis,et al.  Face-MagNet: Magnifying Feature Maps to Detect Small Faces , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[20]  Yizhou Wang,et al.  Face Detection with End-to-End Integration of a ConvNet and a 3D Model , 2016, ECCV.

[21]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[23]  Hao Wang,et al.  Detecting Faces Using Region-based Fully Convolutional Networks , 2017 .

[24]  Hao Wang,et al.  Face R-CNN , 2017, ArXiv.

[25]  Steven C. H. Hoi,et al.  Feature Agglomeration Networks for Single Stage Face Detection , 2017, Neurocomputing.

[26]  Wei Yu,et al.  Visualizing and Comparing AlexNet and VGG using Deconvolutional Layers , 2016 .

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ioannis A. Kakadiaris,et al.  SSFD: A Face Detector using A Single-scale Feature Map , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[30]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.