Increasing the Robustness of CNN-Based Human Body Segmentation in Range Images by Modeling Sensor-Specific Artifacts

This paper addresses the problem of human body parts segmentation in range images acquired using a structured-light imaging system. We propose a solution based on a fully convolutional neural network trained on realistic synthetic data that were simulated in a way that closely emulates our structured-light imaging system with its inherent artifacts such as occlusions, noise and missing data. The results on synthetic test data demonstrate quantitatively the performance of our method in identifying 33 body parts, with negligible confusion between the front and back sides of the body and between the left and right limbs. Our experiments highlight the importance of sensor-specific data augmentation in the training set to improve the robustness of the segmentation. Most importantly, when applied to range data actually acquired by our system, the method was capable of accurately segmenting the different body parts with inter-frame consistency in real-time.

[1]  K. Nishi,et al.  Generation of human depth images with body part labels for complex human pose recognition , 2017, Pattern Recognit..

[2]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[6]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[7]  Wolfram Burgard,et al.  Deep learning for human part discovery in images , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[10]  Guy Godin,et al.  High Resolution Projector for 3D Imaging , 2014, 2014 2nd International Conference on 3D Vision.

[11]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[12]  Iasonas Kokkinos,et al.  Accurate Human-Limb Segmentation in RGB-D Images for Intelligent Mobility Assistance Robots , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[13]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[14]  J A Sethian,et al.  Computing geodesic paths on manifolds. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Christian Wolf,et al.  Human body part estimation from depth images via spatially-constrained deep learning , 2014, Pattern Recognition Letters.

[16]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[18]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[19]  Ziyan Wu,et al.  DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition , 2017, 2017 International Conference on 3D Vision (3DV).

[20]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[23]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Fei-Fei Li,et al.  Towards Viewpoint Invariant 3D Human Pose Estimation , 2016, ECCV.