Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body. The naive focal length assumptions can harm this task with the incorrectly formulated projection matrices. To solve this, we propose Zolly, the first 3DHMR method focusing on perspective-distorted images. Our approach begins with analysing the reason for perspective distortion, which we find is mainly caused by the relative location of the human body to the camera center. We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. We then estimate the distance from distortion scale features rather than environment context features. Afterwards, we integrate the distortion feature with image features to reconstruct the body mesh. To formulate the correct projection matrix and locate the human body position, we simultaneously use perspective and weak-perspective projection loss. Since existing datasets could not handle this task, we propose the first synthetic dataset PDHuman and extend two real-world datasets tailored for this task, all containing perspective-distorted human images. Extensive experiments show that Zolly outperforms existing state-of-the-art methods on both perspective-distorted datasets and the standard benchmark (3DPW).

[1]  Yuguo Li,et al.  Close contact behaviors of university and school students in 10 indoor environments. , 2023, Journal of hazardous materials.

[2]  Michael J. Black,et al.  BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shiqing Xin,et al.  Globally Consistent Normal Orientation for Point Clouds by Regularizing the Winding-Number Field , 2023, ACM Trans. Graph..

[4]  Dahua Lin,et al.  SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Liang Pan,et al.  SHERF: Generalizable Human NeRF from a Single Image , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Chu-Hsing Lin,et al.  TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer , 2022, ArXiv.

[7]  Jianzhuang Liu,et al.  CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation , 2022, ECCV.

[8]  Tae-Hyun Oh,et al.  Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers , 2022, ECCV.

[9]  Hongwen Zhang,et al.  PyMAF-X: Towards Well-Aligned Full-Body Model Regression From Monocular Images , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael J. Black,et al.  Accurate 3D Body Shape Regression using Metric and Semantic Attributes , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chen Change Loy,et al.  HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling , 2022, ECCV.

[12]  Kris Kitani,et al.  Occluded Human Mesh Recovery , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bo Dai,et al.  DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation , 2022, ECCV.

[14]  Jianyi Wang,et al.  SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos , 2021, ECCV.

[15]  Cristian Sminchisescu,et al.  HSPACE: Synthetic Parametric Humans Animated in Complex Environments , 2021, ArXiv.

[16]  Michael J. Black,et al.  ICON: Implicit Clothed humans Obtained from Normals , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael J. Black,et al.  Putting People in their Place: Monocular Regression of 3D People in Depth , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Chen Change Loy,et al.  Playing for 3D Human Recovery , 2021, ArXiv.

[19]  Taku Komura,et al.  Coverage Axis: Inner Point Selection for 3D Shape Skeletonization , 2021, Comput. Graph. Forum.

[20]  Michael J. Black,et al.  SPEC: Seeing People in the Wild with an Estimated Camera , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Bingbing Ni,et al.  Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Jiashi Feng,et al.  PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[24]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Michael J. Black,et al.  On Self-Contact and Human Pose , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Lijuan Wang,et al.  Mesh Graphormer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Hujun Bao,et al.  Reconstructing 3D Human Pose by Watching Humans in the Mirror , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhenan Sun,et al.  PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Cewu Lu,et al.  HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Lourdes Agapito,et al.  SelfPose: 3D Egocentric Pose Estimation From a Headset Mounted Camera , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Lior Fritz,et al.  Beyond Weak Perspective for Monocular 3D Human Pose Estimation , 2020, ECCV Workshops.

[34]  Michael J. Black,et al.  Monocular, One-stage, Regression of Multiple 3D People , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Wanli Ouyang,et al.  3D Human Mesh Regression With Dense Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yangang Wang,et al.  Object-Occluded Human Shape and Pose Estimation From a Single Color Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Andrea Vedaldi,et al.  Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation , 2020, 2021 International Conference on 3D Vision (3DV).

[38]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[40]  Shuangmin Chen,et al.  Top-Down Shape Abstraction Based on Greedy Pole Selection , 2019, IEEE Transactions on Visualization and Computer Graphics.

[41]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Peter V. Gehler,et al.  Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[45]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Christian Theobalt,et al.  Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[53]  F. Moreno-Noguer 3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[59]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[61]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[62]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[63]  Christopher K. I. Williams,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.