论文信息 - Multi-path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

Multi-path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

Large-scale variations still pose a challenge in unconstrained face detection. To the best of our knowledge, no current face detection algorithm can detect a face as large as 800 x 800 pixels while simultaneously detecting another one as small as 8 x 8 pixels within a single image with equally high accuracy. We propose a two-stage cascaded face detection framework, Multi-Path Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a deep neural network with a classic learning strategy, to tackle this challenge. The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes faces at three different scales. It simultaneously utilizes three parallel outputs of the convolutional feature maps to predict multi-scale candidate face regions. The "atrous" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for "hard" examples are embedded in MP-RPN to further boost its performance. The second stage is a Boosted Forests classifier, which utilizes deep facial features pooled from inside the candidate face regions as well as deep contextual features pooled from a larger region surrounding the candidate face regions. This step is included to further remove hard negative samples. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset "hard" partition, outperforming the former best result by 9.6% for the Average Precision.

Martin D. Levine | Yuguang Liu

[1] Rama Chellappa,et al. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[3] Luc Van Gool,et al. Face Detection without Bells and Whistles , 2014, ECCV.

[4] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[5] Rogério Schmidt Feris,et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Shuo Yang,et al. WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10] Junjie Yan,et al. Face detection by structural models , 2014, Image Vis. Comput..

[11] Yuning Jiang,et al. UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[12] Anastasios Tefas,et al. A Fast Deep Convolutional Neural Network for Face Detection in Big Visual Data , 2016, INNS Conference on Big Data.

[13] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15] Li-Jia Li,et al. Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[16] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[17] Shengcai Liao,et al. A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Bin Yang,et al. Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[19] Liang Lin,et al. Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[20] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21] Jian Sun,et al. Joint Cascade Face Detection and Alignment , 2014, ECCV.

[22] Erik Learned-Miller,et al. FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[23] Bin Yang,et al. Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Tao Wang,et al. Face detection using SURF cascade , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27] Shuo Yang,et al. From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Gang Hua,et al. A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Marios Savvides,et al. CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection , 2016, ArXiv.

[31] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.