A Deformable Mixture Parsing Model with Parselets

In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by low-level over-segmentation algorithms and bear strong semantic meaning. We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parse let ensembles are exhibited as the ``And-Or" structure of sub-trees, (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configuration from a pool of Parse let hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.

[1]  Liang Lin,et al.  Trajectory parsing by cluster sampling in spatio-temporal graph , 2009, CVPR.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[4]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[5]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[6]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[7]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Andrew Zisserman,et al.  Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[14]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[15]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[16]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[18]  Tsuhan Chen,et al.  Clothing cosegmentation for recognizing people , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Pushmeet Kohli,et al.  Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts , 2008, International Journal of Computer Vision.

[22]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[23]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[26]  Jitendra Malik,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[28]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[29]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Tong Zhang,et al.  Clothes search in consumer photos via color matching and attribute learning , 2011, ACM Multimedia.

[31]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[33]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[34]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.