A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark

Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment.

[1]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[3]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yici Cai,et al.  Look at Boundary: A Boundary-Aware Face Alignment Algorithm , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[8]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[9]  Christopher Joseph Pal,et al.  Localizing Facial Keypoints with Global Descriptor Search, Neighbour Alignment and Locally Linear Models , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[10]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michel F. Valstar,et al.  A CNN Cascade for Landmark Guided Semantic Part Segmentation , 2016, ECCV Workshops.

[12]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Vishal M. Patel,et al.  Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks , 2018, International Journal of Computer Vision.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaochun Cao,et al.  Beauty eMakeup: A Deep Makeup Transfer System , 2016, ACM Multimedia.

[18]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dani Lischinski,et al.  Spectral Matting , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[21]  Zhe L. Lin,et al.  Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[23]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jianping Shi,et al.  Face Parsing via Recurrent Propagation , 2017, BMVC.

[25]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[27]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hui Zhang,et al.  Residual Encoder Decoder Network and Adaptive Prior for Face Parsing , 2018, AAAI.

[30]  Yunchao Wei,et al.  Devil in the Details: Towards Accurate Single and Multiple Human Parsing , 2018, AAAI.

[31]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Samuel J Kuzminski,et al.  The Devil Is in the Details. , 2017, Journal of the American College of Radiology : JACR.

[33]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[34]  Gerhard Rigoll,et al.  Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[36]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Jonathan Warrell,et al.  Labelfaces: Parsing facial features by multiclass labeling with an epitome prior , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[39]  Hanjiang Lai,et al.  Learning Adaptive Receptive Fields for Deep Image Parsing Network , 2017, CVPR.

[40]  Michel F. Valstar,et al.  Guided Unsupervised Learning of Mode Specific Models for Facial Point Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[41]  Hanjiang Lai,et al.  Deep Recurrent Regression for Facial Landmark Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[44]  Qingshan Liu,et al.  Stacked Hourglass Network for Robust Facial Landmark Localisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Niall Boyce Devil in the details , 2016, The Lancet.

[46]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Hironobu Fujiyoshi,et al.  Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling , 2015, IPSJ Trans. Comput. Vis. Appl..