End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks

Recent years have seen a sharp increase in the number of related yet distinct advances in semantic segmentation. Here, we tackle this problem by leveraging the respective strengths of these advances. That is, we formulate a conditional random field over a four-connected graph as end-to-end trainable convolutional and recurrent networks, and estimate them via an adversarial process. Importantly, our model learns not only unary potentials but also pairwise potentials, while aggregating multi-scale contexts and controlling higher-order inconsistencies. We evaluate our model on two standard benchmark datasets for semantic face segmentation, achieving state-of-the-art results on both of them.

[1]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[2]  Marios Savvides,et al.  A Deep Learning Approach to Joint Face Detection and Segmentation , 2016 .

[3]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[6]  Zhe L. Lin,et al.  Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[8]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[9]  Michael Elad,et al.  Style Transfer Via Texture Synthesis , 2016, IEEE Transactions on Image Processing.

[10]  Xiaolin Hu,et al.  Interlinked Convolutional Neural Networks for Face Parsing , 2015, ISNN.

[11]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Iasonas Kokkinos,et al.  Semantic Part Segmentation with Deep Learning , 2015, ArXiv.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[17]  Erik G. Learned-Miller,et al.  Towards unconstrained face recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[20]  Jean-Marc Odobez,et al.  Joint Adaptive Colour Modelling and Skin, Hair and Clothes Segmentation using Coherent Probabilistic Index Maps , 2011, BMVC.

[21]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[22]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[23]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Maria Virvou,et al.  On assisting a visual-facial affect recognition system with keyboard-stroke pattern information , 2010, Knowl. Based Syst..

[27]  Nan Wang,et al.  What are good parts for hair shape modeling? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[30]  Feng Wu,et al.  Learning High-level Prior with Convolutional Neural Networks for Semantic Segmentation , 2015, ArXiv.

[31]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[32]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[33]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[36]  Dragomir Anguelov,et al.  Markov random field models for hair and face segmentation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Jonathan Warrell,et al.  Labelfaces: Parsing facial features by multiclass labeling with an epitome prior , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[39]  Aurobinda Routray,et al.  Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.

[40]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[41]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[43]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[45]  Marcel van Gerven,et al.  Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[46]  Shihong Lao,et al.  A Compositional Exemplar-Based Model for Hair Segmentation , 2010, ACCV.

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  Larry S. Davis,et al.  Detection and analysis of hair , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Prashant Lahane,et al.  Emotion Recognition from Facial Expressions using Multilevel HMM ( , 2014 .

[51]  Jakob Verbeek,et al.  Convolutional Neural Fabrics , 2016, NIPS.

[52]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[53]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[54]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[55]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[56]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[57]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[58]  Xiaochun Cao,et al.  Makeup Like a Superstar: Deep Localized Makeup Transfer Network , 2016, IJCAI.

[59]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[60]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Maria Del Mar Pujol López,et al.  Face Detection Based on Skin Color Segmentation Using Fuzzy Entropy , 2017, Entropy.

[62]  Marcel van Gerven,et al.  Convolutional Sketch Inversion , 2016, ECCV Workshops.

[63]  Mohammed Bennamoun,et al.  An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.