AGRNet: Adaptive Graph Representation Learning and Reasoning for Face Parsing

Face parsing infers a pixel-wise label to each facial component, which has drawn much attention recently. Previous methods have shown their success in face parsing, which however overlook the correlation among facial components. As a matter of fact, the component-wise relationship is a critical clue in discriminating ambiguous pixels in facial area. To address this issue, we propose adaptive graph representation learning and reasoning over facial components, aiming to learn representative vertices that describe each component, exploit the component-wise relationship and thereby produce accurate parsing results against ambiguity. In particular, we devise an adaptive and differentiable graph abstraction method to represent the components on a graph via pixel-to-vertex projection under the initial condition of a predicted parsing map, where pixel features within a certain facial region are aggregated onto a vertex. Further, we explicitly incorporate the image edge as a prior in the model, which helps to discriminate edge and non-edge pixels during the projection, thus leading to refined parsing results along the edges. Then, our model learns and reasons over the relations among components by propagating information across vertices on the graph. Finally, the refined vertex features are projected back to pixel grids for the prediction of the final parsing map. To train our model, we propose a discriminative loss to penalize small distances between vertices in the feature space, which leads to distinct vertices with strong semantics. Experimental results show the superior performance of the proposed model on multiple face parsing datasets, along with the validation on the human parsing task to demonstrate the generalizability of our model.

[1]  Dacheng Tao,et al.  Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing , 2019, AAAI.

[2]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiang Bai,et al.  Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jianping Shi,et al.  Face Parsing via Recurrent Propagation , 2017, BMVC.

[6]  Yunchao Wei,et al.  Devil in the Details: Towards Accurate Single and Multiple Human Parsing , 2018, AAAI.

[7]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[8]  Hao Shen,et al.  Grand Challenge of 106-Point Facial Landmark Localization , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[9]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Ling Shao,et al.  Learning Compositional Neural Information Fusion for Human Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Vishal M. Patel,et al.  Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks , 2018, International Journal of Computer Vision.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[17]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[18]  Xiaolin Hu,et al.  End-to-end face parsing via interlinked convolutional neural networks , 2020, Cognitive Neurodynamics.

[19]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Abhinav Gupta,et al.  Beyond Grids: Learning Graph Representations for Visual Recognition , 2018, NeurIPS.

[22]  Luc Van Gool,et al.  Semantic Instance Segmentation for Autonomous Driving , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Philip H. S. Torr,et al.  Dual Graph Convolutional Network for Semantic Segmentation , 2019, BMVC.

[24]  Fang Zhao,et al.  Self-Supervised Neural Aggregation Networks for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Ira Kemelmacher-Shlizerman,et al.  Transfiguring portraits , 2016, ACM Trans. Graph..

[26]  Hailin Shi,et al.  Edge-aware Graph Representation Learning and Reasoning for Face Parsing , 2020, ECCV.

[27]  Meng Wang,et al.  Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Iasonas Kokkinos,et al.  Dense and Low-Rank Gaussian CRFs Using Deep Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Yao Sun,et al.  Accurate Facial Image Parsing at Real-Time Speed , 2019, IEEE Transactions on Image Processing.

[30]  Yi Yang,et al.  Macro-Micro Adversarial Network for Human Parsing , 2018, ECCV.

[31]  Yi Yang,et al.  Every Node Counts: Self-Ensembling Graph Convolutional Networks for Semi-Supervised Learning , 2018, Pattern Recognit..

[32]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Hong Liu,et al.  Spatial Pyramid Based Graph Reasoning for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaolin Hu,et al.  Interlinked Convolutional Neural Networks for Face Parsing , 2015, ISNN.

[35]  Hong Liu,et al.  Expectation-Maximization Attention Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Zheng Zhang,et al.  Disentangled Non-Local Neural Networks , 2020, ECCV.

[37]  B. S. Manjunath,et al.  Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[39]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[41]  Dongbin Zhao,et al.  Graph-FCN for Image Semantic Segmentation , 2019, ISNN.

[42]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ming Zeng,et al.  Face Parsing With RoI Tanh-Warping , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Tao Mei,et al.  BraidNet: Braiding Semantics and Details for Accurate Human Parsing , 2019, ACM Multimedia.

[49]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[50]  Shuicheng Yan,et al.  A2-Nets: Double Attention Networks , 2018, NeurIPS.

[51]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Liang Zheng,et al.  Category-Level Adversarial Adaptation for Semantic Segmentation Using Purified Features , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[54]  Hanjiang Lai,et al.  Learning Adaptive Receptive Fields for Deep Image Parsing Network , 2017, CVPR.

[55]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Junqing Yu,et al.  Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Yici Cai,et al.  Look at Boundary: A Boundary-Aware Face Alignment Algorithm , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Fahad Shahbaz Khan,et al.  Learning Human-Object Interaction Detection Using Interaction Points , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Hailin Shi,et al.  A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing , 2020, AAAI.

[60]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[61]  Ling Luo,et al.  EHANet: An Effective Hierarchical Aggregation Network for Face Parsing , 2020, Applied Sciences.

[62]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.