Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image

We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations. The entire process is trained end-to-end, is extremely efficient and obtains state-of-the-art results on Human3.6M outperforming previous approaches both on 2D and 3D errors.

[1]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[2]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Youjie Zhou,et al.  Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[5]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[6]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[9]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[10]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Songhwai Oh,et al.  Complex Non-rigid 3D Shape Recovery Using a Procrustean Normal Distribution Mixture Model , 2015, International Journal of Computer Vision.

[12]  Ioannis A. Kakadiaris,et al.  Estimating Anthropometry and Pose from a Single Uncalibrated Image , 2001, Comput. Vis. Image Underst..

[13]  Chong-Ho Choi,et al.  Procrustean Normal Distribution for Non-Rigid Structure from Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[16]  Rama Chellappa,et al.  View independent human body pose estimation from a single perspective image , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Varun Ramakrishna,et al.  Pose Machines: Articulated Pose Estimation via Inference Machines , 2014, ECCV.

[20]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[22]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[24]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Aleix M. Martínez,et al.  Computing Smooth Time Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Vincent Lepetit,et al.  Predicting People's 3D Poses from Short Sequences , 2015, ArXiv.

[28]  Jonathan Tompson,et al.  Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[29]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[30]  Yan Wang,et al.  A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[33]  Antoni B. Chan,et al.  Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[36]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  John Flynn,et al.  Representing Data by a Mixture of Activated Simplices , 2014, ArXiv.

[39]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[40]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[41]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[42]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Fiora Pirri,et al.  Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[44]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[45]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[46]  Lourdes Agapito,et al.  Learning a Manifold as an Atlas , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  T. Kanade,et al.  Reconstructing 3 D Human Pose from 2 D Image Landmarks , 2012 .

[49]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[50]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Pascal Fua,et al.  Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation , 2016, ArXiv.