A convolutional neural network cascade for face detection

In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance. The proposed CNN cascade operates at multiple resolutions, quickly rejects the background regions in the fast low resolution stages, and carefully evaluates a small number of challenging candidates in the last high resolution stage. To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade. The output of each calibration stage is used to adjust the detection window position for input to the subsequent stage. The proposed method runs at 14 FPS on a single CPU core for VGA-resolution images and 100 FPS using a GPU, and achieves state-of-the-art detection performance on two public face detection benchmarks.

[1]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[2]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[3]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[8]  Christophe Garcia,et al.  A neural architecture for fast and robust face detection , 2002, Object recognition supported by user interaction for service robots.

[9]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[10]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[11]  Wei Zhang,et al.  Real-time Accurate Object Detection using Multiple Resolutions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Zhengyou Zhang,et al.  A Survey of Recent Advances in Face Detection , 2010 .

[15]  Charless C. Fowlkes,et al.  Multiresolution Models for Object Detection , 2010, ECCV.

[16]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[17]  Erik G. Learned-Miller,et al.  Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[18]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Tao Wang,et al.  Face detection using SURF cascade , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Gang Hua,et al.  Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Ying Wu,et al.  Detecting and Aligning Faces by Image Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Igor S. Pandzic,et al.  A method for object detection based on pixel intensity comparisons , 2013, ArXiv.

[25]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[26]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[28]  Gang Hua,et al.  Efficient Boosted Exemplar-Based Face Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[32]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  2013 Ieee International Conference on Computer Vision , 2022 .