Learning to Segment Humans by Stacking Their Body Parts

Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body part likelihood maps. These likelihood maps are obtained in a first stage by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.

[1]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[2]  Sergio Escalera,et al.  Generalized multi-scale stacked sequential learning for multi-class classification , 2015, Pattern Analysis and Applications.

[3]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Sergio Escalera,et al.  Human Body Segmentation with Multi-limb Error-Correcting Output Codes Detection and Graph Cuts Optimization , 2013, IbPRIA.

[6]  Sergio Escalera,et al.  GrabCut-Based Human Segmentation in Video Sequences , 2012, Sensors.

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[9]  Vibhav Vineet,et al.  Human Instance Segmentation from Video using Detector-based Conditional Random Fields , 2011, BMVC.

[10]  Jordi Vitrià,et al.  Minimal design of error-correcting output codes , 2012, Pattern Recognit. Lett..

[11]  Sergio Escalera,et al.  Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sergio Escalera,et al.  Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[15]  Sergio Escalera,et al.  HuPBA8k+: Dataset and ECOC-Graph-Cut based segmentation of human limbs , 2015, Neurocomputing.

[16]  Dewen Hu,et al.  Globally Consistent Reconstruction of Ripped-Up Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[20]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[24]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Carlo Gatta,et al.  Multi-scale stacked sequential learning , 2009, Pattern Recognit..

[26]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  F. Xavier Roca,et al.  Human action recognition using an ensemble of body-part detectors , 2013, Expert Syst. J. Knowl. Eng..

[29]  Jitendra Malik,et al.  Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.