A multi-scale fusion convolutional neural network for face detection

Nowadays, more and more methods have been proposed to solve the problem of face detection based on computer implementation. Due to the variations in background, illumination, pose and facial expressions, the problem of machine face detection is complex. Recently, deep learning approaches achieve an impressive performance on face detection. In this paper, a model named Multi-Scale Fusion Convolutional Neural Network (MSF-CNN) is proposed to train the face detector. The model is trained by Convolutional Neural Network and detecting is based on the Viola & Jones detector's sliding windows structure. Particularly, in the process of feature extraction, we adopt the design of multi-scale feature fusion with different scale convolution kernels. The results are as follows: First, the fusion of multi-scale features are rich in the characteristics of learning, and the classification accuracy is higher than the single-scale. Second, we decrease the model of complexity compared with existed methods of the cascaded CNN. Third, we achieve end-to-end learning compared with cascaded separate training. Meanwhile, the proposed model has showed that the performance of results outperforms the previous methods in some well-known face detection benchmark datasets.

[1]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[2]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[3]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[4]  R. Ciupa,et al.  International Conference , 2023, In Vitro Cellular & Developmental Biology - Animal.

[5]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Tat-Jen Cham,et al.  Fast polygonal integration and its application in extending haar-like features to improve object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Bernie Mulgrew,et al.  IEEE International Joint Conference on Neural Networks , 1999 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Junjie Yan,et al.  Real-time high performance deformable model for face detection in the wild , 2013, 2013 International Conference on Biometrics (ICB).

[13]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[14]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[18]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Bin Yang,et al.  Aggregate channel features for multi-view face detection , 2014, IEEE International Joint Conference on Biometrics.

[21]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.