Multi-class Human Body Parsing with Edge-Enhancement Network

Single human parsing aims at partitioning an image into semantically consistent regions belonging to the body parts or clothing items, which has gained remarkable improvement owing to a wide range of proposed methodologies. From the perspective of the loss design, besides the parsing loss of the final output, most existing studies target on exploiting multiple other losses to enhance parsing results, which is hard to make the model reach balanced condition by adjusting their ratios and may weaken the potential of some losses. In this work, we propose an edge enhancement module to emphasize the potential of edge loss and boundary information. At the same time, local and global information will be explored for complex multi-class human body parsing problem by densely connected atrous spatial pyramid pooling. This scheme results in a simple yet powerful Edge-Enhancement Network (EEN). Extensive experiments demonstrate that EEN achieves 56.55% mIoU on LIP dataset and 62.60% mIoU on CIHP dataset, which outperform the state-of-the-arts by 3.45% and 4.02%, respectively. The code of EEN is available at https://github.com/huangxi6/EEN.

[1]  Cewu Lu,et al.  Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sanja Fidler,et al.  Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yi Yang,et al.  Macro-Micro Adversarial Network for Human Parsing , 2018, ECCV.

[7]  Fei Wang,et al.  SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection , 2019, ArXiv.

[8]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[9]  Jiangjiang Liu,et al.  Three Birds One Stone: A Unified Framework for Salient Object Segmentation, Edge Detection and Skeleton Extraction , 2018, ArXiv.

[10]  Shuicheng Yan,et al.  Human Parsing with Contextualized Convolutional Neural Network , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yuan Hu,et al.  Dynamic Feature Fusion for Semantic Edge Detection , 2019, IJCAI.

[12]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Francesc Moreno-Noguer,et al.  A High Performance CRF Model for Clothes Parsing , 2014, ACCV.

[14]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[15]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Liang Lin,et al.  Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Meng Wang,et al.  Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ming Yang,et al.  Instance-level Human Parsing via Part Grouping Network , 2018, ECCV.

[21]  Yunchao Wei,et al.  Devil in the Details: Towards Accurate Single and Multiple Human Parsing , 2018, AAAI.

[22]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shuicheng Yan,et al.  Human Pose Estimation with Parsing Induced Learner , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Anlong Ming,et al.  Context-Constrained Accurate Contour Extraction for Occlusion Edge Detection , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).