FuRPE: Learning Full-body Reconstruction from Part Experts

Full-body reconstruction is a fundamental but challenging task. Owing to the lack of annotated data, the performances of existing methods are largely limited. In this paper, we propose a novel method named Full-body Reconstruction from Part Experts~(FuRPE) to tackle this issue. In FuRPE, the network is trained using pseudo labels and features generated from part-experts. An simple yet effective pseudo ground-truth selection scheme is proposed to extract high-quality pseudo labels. In this way, a large-scale of existing human body reconstruction datasets can be leveraged and contribute to the model training. In addition, an exponential moving average training strategy is introduced to train the network in a self-supervised manner, further boosting the performance of the model. Extensive experiments on several widely used datasets demonstrate the effectiveness of our method over the baseline. Our method achieves the state-of-the-art performance. Code will be publicly available for further research.

[1]  Michael J. Black,et al.  EMOCA: Emotion Driven Monocular Face Capture and Animation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Wongun Choi,et al.  Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  L. Gool,et al.  MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Blake Hannaford,et al.  A decade retrospective of medical robotics research from 2010 to 2020 , 2021, Science Robotics.

[5]  Shin‐Tson Wu,et al.  Augmented reality and virtual reality displays: emerging technologies and future perspectives , 2021, Light: Science & Applications.

[6]  Song-Chun Zhu,et al.  Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Hongyi Xu,et al.  imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Takaaki Shiratori,et al.  FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[9]  Hongyan Liu,et al.  Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview , 2021, ACM Comput. Surv..

[10]  Dimitrios Tzionas,et al.  Collaborative Regression of Expressive Bodies using Moderation , 2021, 2021 International Conference on 3D Vision (3DV).

[11]  Leonidas J. Guibas,et al.  HuMoR: 3D Human Motion Model for Robust Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Zhenan Sun,et al.  PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Runwei Ding,et al.  Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation , 2021, IEEE Transactions on Multimedia.

[15]  Pawani Porambage,et al.  A Survey on Mobile Augmented Reality With 5G Mobile Edge Computing: Architectures, Applications, and Technical Aspects , 2021, IEEE Communications Surveys & Tutorials.

[16]  Chen Change Loy,et al.  Chasing the Tail in Monocular 3D Human Reconstruction With Prototype Memory , 2020, IEEE Transactions on Image Processing.

[17]  Kevin Lin,et al.  End-to-End Human Pose and Mesh Reconstruction with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Christian Theobalt,et al.  Monocular Real-time Full Body Capture with Inter-part Correlations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael J. Black,et al.  Learning an animatable detailed 3D face model from in-the-wild images , 2020, ACM Trans. Graph..

[20]  Kyoung Mu Lee,et al.  Pose2Pose: 3D Positional Pose-Guided 3D Rotational Pose Prediction for Expressive 3D Human Pose and Mesh Estimation , 2020, ArXiv.

[21]  Kyoung Mu Lee,et al.  Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Vincent Leroy,et al.  DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild , 2020, ECCV.

[23]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[24]  Akansel Cosgun,et al.  Object Handovers: A Review for Robotics , 2020, IEEE Transactions on Robotics.

[25]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andrea Vedaldi,et al.  Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation , 2020, 2021 International Conference on 3D Vision (3DV).

[27]  C. Theobalt,et al.  Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Thomas Brox,et al.  FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Kyoung Mu Lee,et al.  Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Saira Anwar,et al.  A Systematic Review of Studies on Educational Robotics , 2019, Journal of Pre-College Engineering Education Research (J-PEER).

[33]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junsong Yuan,et al.  3D Hand Shape and Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Paul L. Rosin,et al.  Pose2Seg: Detection Free Human Instance Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Qijun Zhao,et al.  Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[38]  Cordelia Schmid,et al.  LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Christian Theobalt,et al.  Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[41]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[42]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[44]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[46]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  F. Klinker,et al.  Exponential moving average versus moving exponential average , 2011, 2001.04237.

[48]  Dimitrios Tzionas,et al.  Embodied hands , 2017, ACM Trans. Graph..