论文信息 - Learning to Segment Humans by Stacking Their Body Parts

Learning to Segment Humans by Stacking Their Body Parts

Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body part likelihood maps. These likelihood maps are obtained in a first stage by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.

Sergio Escalera | Eloi Puertas | Oriol Pujol | Daniel Sánchez | Miguel Ángel Bautista

[1] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[2] Sergio Escalera,et al. Generalized multi-scale stacked sequential learning for multi-class classification , 2015, Pattern Analysis and Applications.

[3] Luc Van Gool,et al. Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Sergio Escalera,et al. Human Body Segmentation with Multi-limb Error-Correcting Output Codes Detection and Graph Cuts Optimization , 2013, IbPRIA.

[6] Sergio Escalera,et al. GrabCut-Based Human Segmentation in Video Sequences , 2012, Sensors.

[7] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[9] Vibhav Vineet,et al. Human Instance Segmentation from Video using Detector-based Conditional Random Fields , 2011, BMVC.

[10] Jordi Vitrià,et al. Minimal design of error-correcting output codes , 2012, Pattern Recognit. Lett..

[11] Sergio Escalera,et al. Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Sergio Escalera,et al. Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] William W. Cohen,et al. Stacked Sequential Learning , 2005, IJCAI.

[15] Sergio Escalera,et al. HuPBA8k+: Dataset and ECOC-Graph-Cut based segmentation of human limbs , 2015, Neurocomputing.

[16] Dewen Hu,et al. Globally Consistent Reconstruction of Ripped-Up Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Daniel P. Huttenlocher,et al. Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[20] Sergio Escalera,et al. On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] David A. Forsyth,et al. Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23] Thomas G. Dietterich. Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[24] Ben Taskar,et al. Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25] Carlo Gatta,et al. Multi-scale stacked sequential learning , 2009, Pattern Recognit..

[26] David A. Forsyth,et al. Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28] F. Xavier Roca,et al. Human action recognition using an ensemble of body-part detectors , 2013, Expert Syst. J. Knowl. Eng..

[29] Jitendra Malik,et al. Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.