Edge-Sensitive Human Cutout With Hierarchical Granularity and Loopy Matting Guidance

Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.

[1]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Pascal Fua,et al.  Tracking Interacting Objects Using Intertwined Flows , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Aykut Erdem,et al.  Alpha Matting With KL-Divergence-Based Sparse Sampling , 2017, IEEE Transactions on Image Processing.

[4]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[5]  Florent Lafarge,et al.  Pyramid scene parsing network in 3D: Improving semantic segmentation of point clouds with multi-scale contextual information , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[6]  Yuxiao Hu,et al.  Improving 3D Human Pose Estimation Via 3D Part Affinity Fields , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Ke Gong,et al.  Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  David Salesin,et al.  A Bayesian approach to digital matting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[11]  Zunlei Feng,et al.  Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields , 2018, ECCV.

[12]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[15]  Xiaonan Luo,et al.  Learning to Segment Object Candidates via Recursive Neural Networks , 2016, IEEE Transactions on Image Processing.

[16]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Ming Tang,et al.  Progressive Cognitive Human Parsing , 2018, AAAI.

[18]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[19]  Tamara L. Berg,et al.  Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[24]  Dacheng Tao,et al.  Geometry-Aware Scene Text Detection with Instance Transformation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Ning Xu,et al.  Deep Image Matting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jiaya Jia,et al.  Deep Automatic Portrait Matting , 2016, ECCV.

[27]  Shuicheng Yan,et al.  Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation , 2018, ECCV.

[28]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[30]  Marc Pollefeys,et al.  Information-Flow Matting , 2017 .

[31]  Zunlei Feng,et al.  Finer-Net: Cascaded Human Parsing with Hierarchical Granularity , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Youbao Tang,et al.  Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks , 2017, IEEE Transactions on Image Processing.

[33]  Zhifeng Hao,et al.  Pixel-Level Discrete Multiobjective Sampling for Image Matting , 2019, IEEE Transactions on Image Processing.

[34]  Pascal Fua,et al.  Tracking Interacting Objects Optimally Using Integer Programming , 2014, ECCV.

[35]  Seungjoon Yang,et al.  Parallel Block Sequential Closed-Form Matting With Fan-Shaped Partitions. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[36]  Mahmood Fathy,et al.  Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes , 2017, IEEE Transactions on Image Processing.

[37]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Dacheng Tao,et al.  World From Blur , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Manuel Menezes de Oliveira Neto,et al.  Shared Sampling for Real‐Time Alpha Matting , 2010, Comput. Graph. Forum.

[40]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Li Sun,et al.  Amalgamating Knowledge towards Comprehensive Classification , 2018, AAAI.

[42]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ming Yang,et al.  Instance-level Human Parsing via Part Grouping Network , 2018, ECCV.

[45]  Xiangjian He,et al.  Trusted Guidance Pyramid Network for Human Parsing , 2018, ACM Multimedia.

[46]  Yi Yang,et al.  Macro-Micro Adversarial Network for Human Parsing , 2018, ECCV.

[47]  Xiaogang Wang,et al.  Fashion Landmark Detection in the Wild , 2016, ECCV.

[48]  Yuanjie Zheng,et al.  Deep Propagation Based Image Matting , 2018, IJCAI.

[49]  Yunchao Wei,et al.  Towards Real World Human Parsing: Multiple-Human Parsing in the Wild , 2017, ArXiv.

[50]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[51]  In So Kweon,et al.  Deep Convolutional Neural Network for Natural Image Matting Using Initial Alpha Mattes , 2019, IEEE Transactions on Image Processing.

[52]  Jon Atli Benediktsson,et al.  SVM- and MRF-Based Method for Accurate Classification of Hyperspectral Images , 2010, IEEE Geoscience and Remote Sensing Letters.

[53]  Pushmeet Kohli,et al.  A perceptually motivated online benchmark for image matting , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Mingli Song,et al.  Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Michael F. Cohen,et al.  Optimized Color Sampling for Robust Matting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Yong-Jin Liu,et al.  CartoonGAN: Generative Adversarial Networks for Photo Cartoonization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2008 .

[60]  Xiaodan Liang,et al.  Human Parsing with Contextualized Convolutional Neural Network. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[61]  Aljoscha Smolic,et al.  AlphaGAN: Generative adversarial networks for natural image matting , 2018, BMVC.

[62]  Shuicheng Yan,et al.  Multi-Human Parsing Machines , 2018, ACM Multimedia.

[63]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[64]  Dacheng Tao,et al.  Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[65]  W. Vale,et al.  CRF and CRF receptors: role in stress responsivity and other behaviors. , 2004, Annual review of pharmacology and toxicology.