Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency

We present a novel 3D pose estimation method based on joint interdependency (JI) for acquiring 3D joints from the human pose of an RGB image. The JI incorporates the body part based structural connectivity of joints to learn the high spatial correlation of human posture on our method. Towards this goal, we propose a new long short-term memory (LSTM)-based deep learning architecture named propagating LSTM networks (p-LSTMs), where each LSTM is connected sequentially to reconstruct 3D depth from the centroid to edge joints through learning the intrinsic JI. In the first LSTM, the seed joints of 3D pose are created and reconstructed into the whole-body joints through the connected LSTMs. Utilizing the p-LSTMs, we achieve the higher accuracy of about 11.2% than state-of-the-art methods on the largest publicly available database. Importantly, we demonstrate that the JI drastically reduces the structural errors at body edges, thereby leads to a significant improvement.

[1]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[4]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Simon Weidert,et al.  Kinect for interactive AR anatomy learning , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[7]  H. Emrah Tasli,et al.  Human Pose Estimation in Space and Time Using 3D CNN , 2016, ECCV Workshops.

[8]  Hui Cheng,et al.  Recurrent 3D Pose Sequence Machines , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alan Conrad Bovik,et al.  Enhancement of Visual Comfort and Sense of Presence on Stereoscopic 3D Images , 2017, IEEE Transactions on Image Processing.

[10]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Roland Göcke,et al.  Monocular Image 3D Human Pose Estimation under Self-Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Mir Rayat Imtiaz Hossain Understanding the sources of error for 3D human pose estimation from monocular images and videos , 2017 .

[13]  Sanghoon Lee,et al.  Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[15]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  M. Westoby,et al.  ‘Structure-from-Motion’ photogrammetry: A low-cost, effective tool for geoscience applications , 2012 .

[17]  Ilya Kostrikov,et al.  Depth Sweep Regression Forests for Estimating 3D Human Pose from Images , 2014, BMVC.

[18]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  J. Gower Generalized procrustes analysis , 1975 .

[20]  Alan C. Bovik,et al.  Stereoscopic 3D Visual Discomfort Prediction: A Dynamic Accommodation and Vergence Interaction Model , 2016, IEEE Transactions on Image Processing.

[21]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kwanghyun Lee,et al.  A New Framework for Measuring 2D and 3D Visual Information in Terms of Entropy , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[26]  Francesc Moreno-Noguer,et al.  A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[29]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[30]  Francisco Javier Díaz Pernas,et al.  A Kinect-based system for cognitive rehabilitation exercises monitoring , 2014, Comput. Methods Programs Biomed..

[31]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[32]  Yang Koo Lee,et al.  Implementation of a Virtual Training Simulator Based on 360° Multi-View Human Action Recognition , 2017, IEEE Access.

[33]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[34]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[36]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Sanghoon Lee,et al.  Enhanced particle-filtering framework for vessel segmentation and tracking , 2017, Comput. Methods Programs Biomed..

[39]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[41]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[43]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Junghwan Kim,et al.  Implementation of Human Action Recognition System Using Multiple Kinect Sensors , 2015, PCM.

[47]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Junghwan Kim,et al.  Implementation of an Omnidirectional Human Motion Capture System Using Multiple Kinect Sensors , 2015, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..