论文信息 - End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks

End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks

Recent years have seen a sharp increase in the number of related yet distinct advances in semantic segmentation. Here, we tackle this problem by leveraging the respective strengths of these advances. That is, we formulate a conditional random field over a four-connected graph as end-to-end trainable convolutional and recurrent networks, and estimate them via an adversarial process. Importantly, our model learns not only unary potentials but also pairwise potentials, while aggregating multi-scale contexts and controlling higher-order inconsistencies. We evaluate our model on two standard benchmark datasets for semantic face segmentation, achieving state-of-the-art results on both of them.

[1] Hao Li,et al. Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[2] Marios Savvides,et al. A Deep Learning Approach to Joint Face Detection and Segmentation , 2016 .

[3] Stephen Gould,et al. Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Thomas Brox,et al. Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[6] Zhe L. Lin,et al. Exemplar-Based Face Parsing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Qiang Chen,et al. Network In Network , 2013, ICLR.

[8] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[9] Michael Elad,et al. Style Transfer Via Texture Synthesis , 2016, IEEE Transactions on Image Processing.

[10] Xiaolin Hu,et al. Interlinked Convolutional Neural Networks for Face Parsing , 2015, ISNN.

[11] Ming-Hsuan Yang,et al. Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Iasonas Kokkinos,et al. Semantic Part Segmentation with Deep Learning , 2015, ArXiv.

[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[17] Erik G. Learned-Miller,et al. Towards unconstrained face recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[20] Jean-Marc Odobez,et al. Joint Adaptive Colour Modelling and Skin, Hair and Clothes Segmentation using Coherent Probabilistic Index Maps , 2011, BMVC.

[21] Camille Couprie,et al. Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[22] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[23] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Maria Virvou,et al. On assisting a visual-facial affect recognition system with keyboard-stroke pattern information , 2010, Knowl. Based Syst..

[27] Nan Wang,et al. What are good parts for hair shape modeling? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[30] Feng Wu,et al. Learning High-level Prior with Convolutional Neural Networks for Semantic Segmentation , 2015, ArXiv.

[31] Thomas S. Huang,et al. Interactive Facial Feature Localization , 2012, ECCV.

[32] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[33] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[36] Dragomir Anguelov,et al. Markov random field models for hair and face segmentation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38] Jonathan Warrell,et al. Labelfaces: Parsing facial features by multiclass labeling with an epitome prior , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[39] Aurobinda Routray,et al. Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.

[40] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[41] Honglak Lee,et al. Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[43] Xiaogang Wang,et al. Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Gang Hua,et al. Labeled Faces in the Wild: A Survey , 2016 .

[45] Marcel van Gerven,et al. Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[46] Shihong Lao,et al. A Compositional Exemplar-Based Model for Hair Segmentation , 2010, ACCV.

[47] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[48] Larry S. Davis,et al. Detection and analysis of hair , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Matti Pietikäinen,et al. Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Prashant Lahane,et al. Emotion Recognition from Facial Expressions using Multilevel HMM ( , 2014 .

[51] Jakob Verbeek,et al. Convolutional Neural Fabrics , 2016, NIPS.

[52] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[53] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[54] Seunghoon Hong,et al. Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[55] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.

[56] Chuan Li,et al. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[57] Jianfei Cai,et al. Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[58] Xiaochun Cao,et al. Makeup Like a Superstar: Deep Localized Makeup Transfer Network , 2016, IJCAI.

[59] Charless C. Fowlkes,et al. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[60] Lucas Theis,et al. Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[61] Maria Del Mar Pujol López,et al. Face Detection Based on Skin Color Segmentation Using Fuzzy Entropy , 2017, Entropy.

[62] Marcel van Gerven,et al. Convolutional Sketch Inversion , 2016, ECCV Workshops.

[63] Mohammed Bennamoun,et al. An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.