论文信息 - Pose-Guided Human Parsing by an AND/OR Graph Using Pose-Context Features

Pose-Guided Human Parsing by an AND/OR Graph Using Pose-Context Features

Parsing human into semantic parts is crucial to human-centric analysis. In this paper, we propose a human parsing pipeline that uses pose cues, i.e., estimates of human joint locations, to provide pose-guided segment proposals for semantic parts. These segment proposals are ranked using standard appearance cues, deep-learned semantic feature, and a novel pose feature called pose-context. Then these proposals are selected and assembled using an And-Or graph to output a parse of the person. The And-Or graph is able to deal with large human appearance variability due to pose, choice of clothes, etc. We evaluate our approach on the popular Penn-Fudan pedestrian parsing dataset, showing that it significantly outperforms the state-of-the-arts, and perform diagnostics to demonstrate the effectiveness of different stages of our pipeline.

[1] Alan L. Yuille,et al. An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[3] Alan L. Yuille,et al. Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[5] Jun Zhu,et al. Human identification using body prior and generalized EMD , 2011, 2011 18th IEEE International Conference on Image Processing.

[6] Gang Song,et al. Object Detection Combining Recognition and Segmentation , 2007, ACCV.

[7] S. Tsogkas,et al. Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[8] Luis E. Ortiz,et al. Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Liang Lin,et al. Clothing Co-parsing by Joint Image Segmentation and Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Changsheng Xu,et al. Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] James M. Rehg,et al. RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] David A. Forsyth,et al. Discriminative hierarchical part-based models for human parsing and action recognition , 2012, J. Mach. Learn. Res..

[13] Allan Hanbury,et al. Skin detection: A random forest approach , 2010, 2010 IEEE International Conference on Image Processing.

[14] Zhuowen Tu,et al. Action Recognition with Actons , 2013, 2013 IEEE International Conference on Computer Vision.

[15] Rainer Stiefelhagen,et al. Part-based clothing segmentation for person retrieval , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Jian Dong,et al. Towards Unified Human Parsing and Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Ming Yang,et al. Real-time clothing recognition in surveillance videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[19] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Charless C. Fowlkes,et al. Shape-based pedestrian parsing , 2011, CVPR 2011.

[21] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[22] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[24] Vittorio Murino,et al. Custom Pictorial Structures for Re-identification , 2011, BMVC.

[25] Yifei Lu,et al. Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Xiaogang Wang,et al. Pedestrian Parsing via Deep Decompositional Network , 2013, 2013 IEEE International Conference on Computer Vision.

[28] Jun Zhu,et al. Learning reconfigurable scene representation by tangram model , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[29] Robert T. Collins,et al. A Generative Model for Simultaneous Estimation of Human Body Shape and Pixel-Level Segmentation , 2012, ECCV.

[30] Lei Wang,et al. In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[31] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Alan L. Yuille,et al. Semantic part segmentation using compositional model combining shape and appearance , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Iasonas Kokkinos,et al. Semantic Part Segmentation with Deep Learning , 2015, ArXiv.

[34] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[35] Gregory Shakhnarovich,et al. Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.