Progressively Refined Face Detection Through Semantics-Enriched Representation Learning

Feature pyramids aim to learn multi-scale representations for detecting faces over various scales. However, they often lack adequate context over different scales, especially when there are many tiny faces in the wild. In this paper, we propose an attention-guided semantically enriched feature aggregation framework to learn a feature pyramid with rich semantics at all scales for face detection. Specifically, high-level abstract features are directly integrated into low-level representations by skip connections to retain as much semantic as possible. In addition, an attention mechanism is employed as a gate to emphasize relevant features and suppress useless features during feature fusion. Inspired by human visual perception of tiny faces, we specially design a deep progressive refined loss (DPRL) to effectively facilitate feature learning. According to the above principles, we design and investigate various feature pyramid frameworks through extensive experiments. Finally, two typical structures named Centralized Attention Feature (CAF) and Distributed Attention Feature (DAF) are proposed for face detection, which are in-place and end-to-end trainable. Extensive experiments across different aggregation architectures on four challenging face detection benchmarks demonstrate the superiority of our framework over state-of-the-art methods.

[1]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[2]  Shengcai Liao,et al.  A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Shifeng Zhang,et al.  FaceBoxes: A CPU real-time face detector with high accuracy , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xu Tang,et al.  PyramidBox: A Context-assisted Single Shot Face Detector , 2018, ECCV.

[10]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[11]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[12]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[13]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[15]  Hao Wang,et al.  Detecting Faces Using Inside Cascaded Contextual CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Stan Z. Li,et al.  Single-Shot Scale-Aware Network for Real-Time Face Detection , 2019, International Journal of Computer Vision.

[18]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[19]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Steven C. H. Hoi,et al.  Feature Agglomeration Networks for Single Stage Face Detection , 2017, Neurocomputing.

[22]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[23]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[24]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[26]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[29]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Hao Wang,et al.  Detecting Faces Using Region-based Fully Convolutional Networks , 2017 .

[31]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[32]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Fuchun Sun,et al.  Deep Feature Pyramid Reconfiguration for Object Detection , 2018, ECCV.

[40]  Ying Wu,et al.  Detecting and Aligning Faces by Image Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Hao Wang,et al.  Face R-CNN , 2017, ArXiv.

[43]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[48]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[49]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[50]  Ran Tao,et al.  Seeing Small Faces from Robust Anchor's Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Mohan M. Trivedi,et al.  To boost or not to boost? On the limits of boosted trees for object detection , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[52]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiaolin Hu,et al.  Scale-Aware Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[55]  Shuo Yang,et al.  Face Detection through Scale-Friendly Deep Convolutional Networks , 2017, ArXiv.

[56]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[57]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[58]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Marios Savvides,et al.  CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection , 2016, ArXiv.

[60]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.