Dual-Structure Disentangling Variational Generation for Data-Limited Face Parsing

Deep learning based face parsing methods have attained state-of-the-art performance in recent years. Their superior performance heavily depends on the large-scale annotated training data. However, it is expensive and time-consuming to construct a large-scale pixel-level manually annotated dataset for face parsing. To alleviate this issue, we propose a novel Dual-Structure Disentangling Variational Generation (D2VG) network. Benefiting from the interpretable factorized latent disentanglement in VAE, D2VG can learn a joint structural distribution of facial image and its corresponding parsing map. Owing to these, it can synthesize large-scale paired face images and parsing maps from a standard Gaussian distribution. Then, we adopt both manually annotated and synthesized data to train a face parsing model in a supervised way. Since there are inaccurate pixel-level labels in synthesized parsing maps, we introduce a coarseness-tolerant learning algorithm, to effectively handle these noisy or uncertain labels. In this way, we can significantly boost the performance of face parsing. Extensive quantitative and qualitative results on HELEN, CelebAMask-HQ and LaPa demonstrate the superiority of our methods.

[1]  Yuxin Chen,et al.  Pixel Level Data Augmentation for Semantic Image Segmentation Using Generative Adversarial Networks , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Frédo Durand,et al.  Data augmentation using learned transforms for one-shot medical image segmentation , 2019, ArXiv.

[3]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[4]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[7]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[8]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[9]  Weixiang Liu,et al.  Functional gradient ascent for Probit regression , 2012, Pattern Recognit..

[10]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[11]  Takeharu Eda,et al.  Effective Data Augmentation with Multi-Domain Learning GANs , 2019, AAAI.

[12]  Tieniu Tan,et al.  IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis , 2018, NeurIPS.

[13]  Tao Mei,et al.  A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark , 2019, ArXiv.

[14]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[15]  Hanjiang Lai,et al.  Learning Adaptive Receptive Fields for Deep Image Parsing Network , 2017, CVPR.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Changick Kim,et al.  Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[19]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[20]  Shang-Hong Lai,et al.  AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation , 2018, ECCV.

[21]  Xiaolin Hu,et al.  Interlinked Convolutional Neural Networks for Face Parsing , 2015, ISNN.

[22]  Zengchang Qin,et al.  Emotion Classification with Data Augmentation Using Generative Adversarial Networks , 2018, PAKDD.

[23]  Jianping Shi,et al.  Face Parsing via Recurrent Propagation , 2017, BMVC.

[24]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Yao Sun,et al.  Learning adaptive receptive fields for deep image parsing networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhe L. Lin,et al.  Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ran He,et al.  Dual Variational Generation for Low-Shot Heterogeneous Face Recognition , 2019, NeurIPS.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[33]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ming Zeng,et al.  Face Parsing With RoI Tanh-Warping , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hui Zhang,et al.  Residual Encoder Decoder Network and Adaptive Prior for Face Parsing , 2018, AAAI.