Face Detection and Segmentation with Generalized Intersection over Union Based on Mask R-CNN

As a research hotspot of computer vision and information security, face detection has been widely developed in the past few decades. However, most of the existing detection methods only realize the location of the bounding box, which leads to background noise in the face features as well as limited accuracy of detection. To overcome these drawbacks, a face detection and segmentation method with Generalized Intersection over Union (GIoU) based on Mask R-CNN is proposed in this paper, which is called G-Mask. In this method, ResNet-101 is used to extract features, RPN is used to generate RoIs, and RoIAlign faithfully retains the exact spatial locations to generate binary mask through Fully Convolution Network. In particular, to achieve better performance in multi-scale face detection tasks, we utilize GIoU as the bounding box loss function. Furthermore, a new face dataset with segmentation annotation information is constructed in this paper to train the model. The experimental results of the well-known benchmark FDDB and AFW show that the proposed G-Mask method achieves promising face detection performance compared with Faster R-CNN and the original Mask R-CNN method, and also can realize the instance-level face information segmentation while detecting.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Steven C. H. Hoi,et al.  Face Detection using Deep Learning: An Improved Faster RCNN Approach , 2017, Neurocomputing.

[3]  Ole Helvig Jensen,et al.  Implementing the Viola-Jones Face Detection Algorithm , 2008 .

[4]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[5]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[7]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[8]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[9]  Horst Bischof,et al.  Robust face detection by simple means , 2012 .

[10]  Yi Zhang,et al.  Gradient-based subspace phase correlation for fast and effective image alignment , 2014, J. Vis. Commun. Image Represent..

[11]  Huaizu Jiang,et al.  Face Detection with the Faster R-CNN , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[12]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Lu Zhang,et al.  hPSD: A Hybrid PU-Learning-Based Spammer Detection Model for Product Reviews , 2020, IEEE Transactions on Cybernetics.

[15]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yongkang Wong,et al.  Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition , 2011, CVPR 2011 WORKSHOPS.

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[20]  De Xu,et al.  Face Detection With Different Scales Based on Faster R-CNN , 2019, IEEE Transactions on Cybernetics.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Shu Zhan,et al.  Face detection using representation learning , 2016, Neurocomputing.

[23]  Igor S. Pandzic,et al.  A method for object detection based on pixel intensity comparisons , 2013, ArXiv.

[24]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[25]  Zheng Wang,et al.  A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[26]  Peijun Du,et al.  Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging , 2016, Neurocomputing.

[27]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..