Reweighted sparse representation with residual compensation for 3D human pose estimation from a single RGB image

Abstract 3D human pose estimation from 2D joints of an image is a worthwhile and challenging research topic. Since a specific 2D pose could be projected from various 3D poses, the ambiguity becomes a difficult obstacle when recovering 3D pose from 2D. Many supervised learning solutions have been proposed in recent years, however, most of them require an abundant of well-annotated training samples to get satisfied estimation performance. In this paper, an unsupervised approach that built upon sparse representation (SR) is presented and its two enhancement schemes are provided. As the first scheme, the reweighting one improves the sparsity of the SR model which leads to a more accurate solution. Furthermore, based on the resulting minimization residual of the loss function in 2D, the discrepancy between the estimated 3D pose and the target 3D pose is used to adjust the estimated 3D pose to reach a better accuracy. Comprehensive experiments have been conducted on four well-recognized benchmarks for evaluation. Significant and consistent improvements over existing SR models are observed in our experiments. Furthermore, the proposed approach even outperforms many supervised learning works.

[1]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[2]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[4]  Jun Yu,et al.  Exemplar-based 3D human pose estimation with sparse spectral embedding , 2017, Neurocomputing.

[5]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[6]  Peng Wang,et al.  Appearance based pedestrians' head pose and body orientation estimation using deep learning , 2018, Neurocomputing.

[7]  Youjie Zhou,et al.  Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[8]  B. Triggs,et al.  3D human pose from silhouettes by relevance vector regression , 2004, CVPR 2004.

[9]  Ilya Kostrikov,et al.  Depth Sweep Regression Forests for Estimating 3D Human Pose from Images , 2014, BMVC.

[10]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[11]  Xue Li,et al.  Blockwise coordinate descent schemes for efficient and effective dictionary learning , 2016, Neurocomputing.

[12]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Jianping Fan,et al.  Leveraging Content Sensitiveness and User Trustworthiness to Recommend Fine-Grained Privacy Settings for Social Image Sharing , 2018, IEEE Transactions on Information Forensics and Security.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[20]  Mohan S. Kankanhalli,et al.  Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps , 2016, ECCV.

[21]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yao Lu,et al.  Video pose estimation with global motion cues , 2017, Neurocomputing.

[23]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[24]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Song-Chun Zhu,et al.  Monocular 3D Human Pose Estimation by Predicting Depth on Joints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Alain Rakotomamonjy,et al.  Applying alternating direction method of multipliers for constrained dictionary learning , 2013, Neurocomputing.

[28]  Xiaowei Zhou,et al.  MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jun Yu,et al.  Multimodal Face-Pose Estimation With Multitask Manifold Deep Learning , 2019, IEEE Transactions on Industrial Informatics.

[32]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[33]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[34]  Francesc Moreno-Noguer,et al.  A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Juergen Gall,et al.  A Dual-Source Approach for 3D Human Pose Estimation from a Single Image , 2017, Comput. Vis. Image Underst..

[39]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[40]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Keqin Li,et al.  Maxdenominator Reweighted Sparse Representation for Tumor Classification , 2017, Scientific reports.

[42]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[44]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[45]  Antoni B. Chan,et al.  Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Rómer Rosales,et al.  Estimating 3D body pose using uncalibrated cameras , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[49]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Yin Wang,et al.  Self Scaled Regularized Robust Regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Vincent Lepetit,et al.  Learning Latent Representations of 3D Human Pose with Deep Neural Networks , 2018, International Journal of Computer Vision.

[53]  Juan José Pantrigo,et al.  Multiview 3D human pose estimation using improved least-squares and LSTM networks , 2019, Neurocomputing.

[54]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Francesc Moreno-Noguer,et al.  3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[57]  Qingmin Liao,et al.  Depth-images-based pose estimation using regression forests and graphical models , 2015, Neurocomputing.

[58]  Chen Qian,et al.  3D Human Pose Machines with Self-Supervised Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Liangpei Zhang,et al.  Total Variation Regularized Reweighted Sparse Nonnegative Matrix Factorization for Hyperspectral Unmixing , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[60]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[61]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[62]  Dariu Gavrila,et al.  Multi-view 3D Human Pose Estimation in Complex Environment , 2011, International Journal of Computer Vision.

[63]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[64]  Jun Yu,et al.  Human pose recovery by supervised spectral embedding , 2015, Neurocomputing.

[65]  Hao Jiang 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[66]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[67]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Roland Göcke,et al.  Monocular Image 3D Human Pose Estimation under Self-Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.