论文信息 - Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence. Here, two cases are considered: (i) the image locations of the human joints are provided and (ii) the image locations of joints are unknown. In the former case, a novel approach is introduced that integrates a sparsity-driven 3D geometric prior and temporal smoothness. In the latter case, the former case is extended by treating the image locations of the joints as latent variables to take into account considerable uncertainties in 2D joint locations. A deep fully convolutional network is trained to predict the uncertainty maps of the 2D joint locations. The 3D pose estimates are realized via an Expectation-Maximization algorithm over the entire sequence, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Empirical evaluation on the Human3.6M dataset shows that the proposed approaches achieve greater 3D pose estimation accuracy over state-of-the-art baselines. Further, the proposed approach outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.

[1] Hsi-Jian Lee,et al. Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[2] Timothy F. Cootes,et al. Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[3] Jitendra Malik,et al. Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4] Gregory D. Hager,et al. Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Camillo J. Taylor,et al. Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[6] Henning Biermann,et al. Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7] Cristian Sminchisescu,et al. Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] A. ADoefaa,et al. ? ? ? ? f ? ? ? ? ? , 2003 .

[10] Cristian Sminchisescu,et al. 3D Human Motion Analysis in Monocular Video Techniques and Challenges , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[11] Ankur Agarwal,et al. Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[13] Jitendra Malik,et al. Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .

[15] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[16] Raquel Urtasun,et al. Implicitly Constrained Gaussian Process Regression for Monocular Non-Rigid Pose Estimation , 2010, NIPS.

[17] Simon Lucey,et al. Deterministic 3D Human Pose Estimation Using Rigid Structure , 2010, ECCV.

[18] Hao Jiang. 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[19] David J. Fleet,et al. Video-Based People Tracking , 2010, Handbook of Ambient Intelligence and Smart Environments.

[20] Bernt Schiele,et al. Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22] Takeo Kanade,et al. Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Ben Taskar,et al. Parsing human motion with stretchable models , 2011, CVPR 2011.

[24] Michael Isard,et al. Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[25] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[26] Deva Ramanan,et al. Part-Based Models for Finding People and Estimating Their Pose , 2011, Visual Analysis of Humans.

[27] Yaser Sheikh,et al. 3D reconstruction of a smooth articulated trajectory from a monocular image sequence , 2011, 2011 International Conference on Computer Vision.

[28] T. Kanade,et al. Reconstructing 3 D Human Pose from 2 D Image Landmarks , 2012 .

[29] Hongdong Li,et al. A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[30] Francesc Moreno-Noguer,et al. Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31] T. Kanade,et al. Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[32] Tae-Kyun Kim,et al. Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Francesc Moreno-Noguer,et al. A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Weiyu Zhang,et al. From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[35] Simon Lucey,et al. Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[37] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Bamdev Mishra,et al. Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[39] Fernando De la Torre,et al. Spatio-temporal Matching for Human Detection in Video , 2014, ECCV.

[40] Jonathan Tompson,et al. Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[41] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[42] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Youjie Zhou,et al. Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[44] Cordelia Schmid,et al. Mixing Body-Part Sequences for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Wen Gao,et al. Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[48] Antoni B. Chan,et al. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[49] Vincent Lepetit,et al. Predicting People's 3D Poses from Short Sequences , 2015, ArXiv.

[50] Deva Ramanan,et al. Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51] Mubarak Shah,et al. Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52] Xiaowei Zhou,et al. 3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[53] Antoni B. Chan,et al. Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54] Songhwai Oh,et al. Complex Non-rigid 3D Shape Recovery Using a Procrustean Normal Distribution Mixture Model , 2015, International Journal of Computer Vision.

[55] Song-Chun Zhu,et al. Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Michael J. Black,et al. Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Andrew Zisserman,et al. Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58] Xiaowei Zhou,et al. Articulated motion estimation from a monocular image sequence using spherical tangent bundles , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[59] Vincent Lepetit,et al. Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Xiaowei Zhou,et al. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.