Accurate Facial Image Parsing at Real-Time Speed

In this paper, we propose a design scheme for deep learning networks in the face parsing task with promising accuracy and real-time inference speed. By analyzing the differences between the general image parsing task and face parsing task, we first revisit the structure of traditional FCN and make improvements to adapt to the unique properties of the face parsing task. Especially, the concept of Normalized Receptive Field is proposed to give more insights on designing the network. Then, a novel loss function called Statistical Contextual Loss is introduced, which integrates richer contextual information and regularizes features during training. For further model acceleration, we propose a semi-supervised distillation scheme that effectively transfers the learned knowledge to a lighter network. Extensive experiments on LFW and Helen dataset demonstrate the significant superiority of the new design scheme on both efficacy and efficiency.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Marios Savvides,et al.  Facecut - a robust approach for facial feature segmentation , 2012, 2012 19th IEEE International Conference on Image Processing.

[3]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Sergio Escalera,et al.  End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks , 2017, ArXiv.

[5]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[8]  Michel F. Valstar,et al.  A CNN Cascade for Landmark Guided Semantic Part Segmentation , 2016, ECCV Workshops.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Xiao Zhang,et al.  Range Loss for Deep Face Recognition with Long-Tailed Training Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Jianping Shi,et al.  Face Parsing via Recurrent Propagation , 2017, BMVC.

[15]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[16]  Jan Kautz,et al.  Deep Semantic Face Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[18]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[19]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Mohammed Bennamoun,et al.  An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Changhu Wang,et al.  Network Morphism , 2016, ICML.

[24]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Zhe L. Lin,et al.  Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Charless C. Fowlkes,et al.  Using Segmentation to Predict the Absence of Occluded Parts , 2015, BMVC.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiangjian He,et al.  Face Parsing via a Fully-Convolutional Continuous CRF Neural Network , 2017, ArXiv.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Xiaolin Hu,et al.  Interlinked Convolutional Neural Networks for Face Parsing , 2015, ISNN.

[31]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[33]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[34]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[35]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Jonathan Warrell,et al.  Labelfaces: Parsing facial features by multiclass labeling with an epitome prior , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[38]  Hanjiang Lai,et al.  Learning Adaptive Receptive Fields for Deep Image Parsing Network , 2017, CVPR.

[39]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andrew W. Fitzgibbon,et al.  Reconstructing High Quality Face-Surfaces using Model Based Stereo , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.