Synthesizing Training Images for Boosting Human 3D Pose Estimation

Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.

[1]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[2]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[3]  D. Cohen-Or,et al.  Parametric reshaping of human bodies in images , 2010, ACM Trans. Graph..

[4]  Scott Schaefer,et al.  Image deformation using moving least squares , 2006, ACM Trans. Graph..

[5]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[6]  Mathieu Aubry,et al.  Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[9]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[10]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[11]  Andrew P. Witkin,et al.  Large steps in cloth simulation , 1998, SIGGRAPH.

[12]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[16]  Yule Yuan,et al.  Automatic foreground extraction of clothing images based on GrabCut in massive images , 2012, 2012 IEEE International Conference on Information Science and Technology.

[17]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[18]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[21]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[23]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Hans-Peter Seidel,et al.  MovieReshape: tracking and reshaping of humans in videos , 2010, ACM Trans. Graph..

[27]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cristian Sminchisescu,et al.  Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Deva Ramanan,et al.  Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jitendra Malik,et al.  R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[34]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[35]  Pietro Perona,et al.  Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[36]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael J. Black,et al.  DRAPE: DRessing Any PErson , 2012, ACM Trans. Graph..

[39]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[40]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Youjie Zhou,et al.  Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[42]  Antoni B. Chan,et al.  Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[44]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[45]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[48]  Roland Göcke,et al.  Monocular Image 3D Human Pose Estimation under Self-Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[50]  Antonio M. López,et al.  Virtual and Real World Adaptation for Pedestrian Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[52]  Sebastian Nowozin,et al.  A Non-parametric Bayesian Network Prior of Human Pose , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  François Laviolette,et al.  Domain-Adversarial Neural Networks , 2014, ArXiv.

[55]  Camillo J. Taylor,et al.  Reconstruction of articulated objects from point correspondences in a single uncalibrated image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[56]  Ivor W. Tsang,et al.  Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[58]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[59]  Antoni B. Chan,et al.  Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[60]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.