Feature Agglomeration Networks for Single Stage Face Detection

Recent years have witnessed promising results of face detection using deep learning, especially for the family of region-based convolutional neural networks (R-CNN) methods and their variants. Despite making remarkable progresses, face detection in the wild remains an open research challenge especially when detecting faces at vastly different scales and characteristics. In this paper, we propose a novel framework of "Feature Agglomeration Networks" (FAN) to build a new single stage face detector, which not only achieves state-of-the-art performance but also runs efficiently. As inspired by the recent success of Feature Pyramid Networks (FPN) \cite{lin2016fpn} for generic object detection, the core idea of our framework is to exploit inherent multi-scale features of a single convolutional neural network to detect faces of varied scales and characteristics by aggregating higher-level semantic feature maps of different scales as contextual cues to augment lower-level feature maps via a hierarchical agglomeration manner at marginal extra computation cost. Unlike the existing FPN approach, we construct our FAN architecture using a new {\it Agglomerative Connection} module and further propose a {\it Hierarchical Loss} to effectively train the FAN model. We evaluate the proposed FAN detector on several public face detection benchmarks and achieved new state-of-the-art results with real-time detection speed on GPU.

[1]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[4]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[6]  Georgios Tziritas,et al.  Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis , 1999, IEEE Trans. Multim..

[7]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[11]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hao Wang,et al.  Detecting Faces Using Region-based Fully Convolutional Networks , 2017 .

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[15]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[21]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaolin Hu,et al.  Scale-Aware Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shuo Yang,et al.  Face Detection through Scale-Friendly Deep Convolutional Networks , 2017, ArXiv.

[28]  Steven C. H. Hoi,et al.  Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[29]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[35]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[36]  Hao Wang,et al.  Face R-CNN , 2017, ArXiv.

[37]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ran Tao,et al.  Seeing Small Faces from Robust Anchor's Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Mohan M. Trivedi,et al.  To boost or not to boost? On the limits of boosted trees for object detection , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[41]  Marios Savvides,et al.  CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection , 2016, ArXiv.