论文信息 - MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation

MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation

Heatmap representations have formed the basis of human pose estimation systems for many years, and their extension to 3D has been a fruitful line of recent research. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and Z to metric depth around the subject. To obtain metric-scale predictions, 2.5D methods need a separate post-processing step to resolve scale ambiguity. Further, they cannot localize body joints outside the image boundaries, leading to incomplete estimates for truncated images. To address these limitations, we propose metric-scale truncation-robust (<italic>MeTRo</italic>) volumetric heatmaps, whose dimensions are all defined in metric 3D space, instead of being aligned with image space. This reinterpretation of heatmap dimensions allows us to directly estimate complete, metric-scale poses without test-time knowledge of distance or relying on anthropometric heuristics, such as bone lengths. To further demonstrate the utility our representation, we present a differentiable combination of our 3D metric-scale heatmaps with 2D image-space ones to estimate absolute 3D pose (our <italic>MeTRAbs</italic> architecture). We find that supervision via absolute pose loss is crucial for accurate non-root-relative localization. Using a ResNet-50 backbone without further learned layers, we obtain state-of-the-art results on Human3.6M, MPI-INF-3DHP and MuPoTS-3D. Our code is publicly available.<xref ref-type="fn" rid="fn1">1</xref><fn id="fn1"><label>1</label><monospace><uri>https://vision.rwth-aachen.de/metrabs</uri></monospace></fn>

B. Leibe | K. Arras | Timm Linder | István Sárándi

[1] David Picard,et al. Multi-Task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Q. Pham,et al. Single-shot 3D multi-person pose estimation in complex images , 2021 .

[3] Lior Fritz,et al. Beyond Weak Perspective for Monocular 3D Human Pose Estimation , 2020, ECCV Workshops.

[4] Yu Sun,et al. CenterHMR: a Bottom-up Single-shot Method for Multi-person 3D Mesh Recovery from a Single Image , 2020, ArXiv.

[5] Jinah Park,et al. Data augmentation method for improving the accuracy of human pose estimation with cropped images , 2020, Pattern Recognit. Lett..

[6] Shuangquan Wang,et al. The "Vertigo Effect" on Your Smartphone: Dolly Zoom via Single Shot View Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7] Xiaowei Zhou,et al. Coherent Reconstruction of Multiple Humans From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Bingbing Ni,et al. Deep Kinematics Analysis for Monocular 3D Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Cristian Sminchisescu,et al. Three-Dimensional Reconstruction of Human Interactions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Andras Lorincz,et al. Multi-Person Absolute 3D Human Pose Estimation with Weak Depth Supervision , 2020, ICANN.

[11] Timm Linder,et al. Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[12] Yingli Tian,et al. Monocular human pose estimation: A survey of deep learning-based methods , 2020, Comput. Vis. Image Underst..

[13] Sen Jia,et al. How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[14] Pascal Fua,et al. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera , 2019, ACM Trans. Graph..

[15] Cordelia Schmid,et al. LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.

[17] Stephen P. Boyd,et al. Differentiable Convex Optimization Layers , 2019, NeurIPS.

[18] Pascal Fua,et al. What Face and Body Shapes Can Tell Us About Height , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[19] Kui Jia,et al. HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Nadia Magnenat-Thalmann,et al. Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Ganesh Ramakrishnan,et al. Multi-Person 3D Human Pose Estimation from Monocular Images , 2019, 2019 International Conference on 3D Vision (3DV).

[22] Kyoung Mu Lee,et al. Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Alexander G. Schwing,et al. SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] András Lörincz,et al. Absolute Human Pose Estimation with Depth Prediction Network , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[25] Christian Theobalt,et al. In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Saurabh Sharma,et al. Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Yuxiao Hu,et al. Improving 3D Human Pose Estimation Via 3D Part Affinity Fields , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Zhen He,et al. 3D Human Pose Estimation With 2D Marginal Heatmaps , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[30] David Picard,et al. Human Pose Regression by Combining Indirect Part Detection and Contextual Information , 2017, Comput. Graph..

[31] Takeo Kanade,et al. Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Yan Huang,et al. Learning Depth-aware Heatmaps for 3D Human Pose Estimation in the Wild , 2019, BMVC.

[33] Alan L. Yuille,et al. OriNet: A Fully Convolutional Network for 3D Human Pose Estimation , 2018, BMVC.

[34] Maria A. Amer,et al. Deep 3D Human Pose Estimation Under Partial Body Presence , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[35] Stephen Lin,et al. An Integral Pose Regression System for the ECCV2018 PoseTrack Challenge , 2018, ArXiv.

[36] Kai Oliver Arras,et al. Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation , 2018, ArXiv.

[37] Bodo Rosenhahn,et al. Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[38] Iasonas Kokkinos,et al. Dense Pose Transfer , 2018, ECCV.

[39] Kai Oliver Arras,et al. How Robust is 3D Human Pose Estimation to Occlusion? , 2018, ArXiv.

[40] Nicolas Padoy,et al. MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation , 2018, ArXiv.

[41] Antitza Dantcheva,et al. Show me your face and I will tell you your height, weight and body mass index , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[42] Pascal Fua,et al. What Face and Body Shapes Can Tell About Height , 2018, ArXiv.

[43] Xiaowei Zhou,et al. Ordinal Depth Supervision for 3D Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Pavlo Molchanov,et al. Hand Pose Estimation via Latent 2.5D Heatmap Regression , 2018, ECCV.

[45] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[46] Honggang Qi,et al. Multi-Scale Structure-Aware Network for Human Pose Estimation , 2018, ECCV.

[47] Xiaogang Wang,et al. 3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Pascal Fua,et al. Learning Monocular 3D Human Pose Estimation from Multi-view Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Marcus A. Magnor,et al. Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Wolfram Burgard,et al. 3D Human Pose Estimation in RGBD Images for Robotic Task Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[51] David Picard,et al. 2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Zhen He,et al. Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[53] Christian Theobalt,et al. Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[54] Yichen Wei,et al. Integral Human Pose Regression , 2017, ECCV.

[55] Song-Chun Zhu,et al. Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation , 2017, AAAI.

[56] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] P. Fua,et al. Learning Monocular 3 D Human Pose Estimation from Multiview Images , 2018 .

[58] Song-Chun Zhu,et al. Monocular 3D Human Pose Estimation by Predicting Depth on Joints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59] Xiaogang Wang,et al. Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60] Cordelia Schmid,et al. LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62] Hans-Peter Seidel,et al. VNect , 2017, ACM Trans. Graph..

[63] Yichen Wei,et al. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64] Yichen Wei,et al. Compositional Human Pose Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[66] Cordelia Schmid,et al. Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Lourdes Agapito,et al. Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Deva Ramanan,et al. 3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Pascal Fua,et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[70] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Ioannis A. Kakadiaris,et al. 3D Human pose estimation: A review of the literature and analysis of covariates , 2016, Comput. Vis. Image Underst..

[72] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[73] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[74] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[75] Jitendra Malik,et al. Amodal Completion and Size Constancy in Natural Scenes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[76] Connor Greenwell,et al. DEEPFOCAL: A method for direct focal length estimation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[77] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[80] Cristian Sminchisescu,et al. Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[81] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[82] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.