3D Human Pose Estimation via Intuitive Physics

Estimating 3D humans from images often produces implausible bodies that lean, float, or penetrate the floor. Such methods ignore the fact that bodies are typically supported by the scene. A physics engine can be used to enforce physical plausibility, but these are not differentiable, rely on unrealistic proxy bodies, and are difficult to integrate into existing optimization and learning frameworks. In contrast, we exploit novel intuitive-physics (IP) terms that can be inferred from a 3D SMPL body interacting with the scene. Inspired by biomechanics, we infer the pressure heatmap on the body, the Center of Pressure (CoP) from the heatmap, and the SMPL body's Center of Mass (CoM). With these, we develop IPMAN, to estimate a 3D body from a color image in a"stable"configuration by encouraging plausible floor contact and overlapping CoP and CoM. Our IP terms are intuitive, easy to implement, fast to compute, differentiable, and can be integrated into existing optimization and regression methods. We evaluate IPMAN on standard datasets and MoYo, a new dataset with synchronized multi-view images, ground-truth 3D bodies with complex poses, body-floor contact, CoM and pressure. IPMAN produces more plausible results than the state of the art, improving accuracy for static poses, while not hurting dynamic ones. Code and data are available for research at https://ipman.is.tue.mpg.de.

[1]  Michael J. Black,et al.  Detecting Human-Object Contact in Images , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael J. Black,et al.  ECON: Explicit Clothed humans Optimized via Normal integration , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael J. Black,et al.  MIME: Human-Aware 3D Scene Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Gang Yu,et al.  D&D: Learning Human Dynamics from Dynamic Camera , 2022, ECCV.

[5]  Jianzhuang Liu,et al.  CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation , 2022, ECCV.

[6]  Michael J. Black,et al.  Capturing and Inferring Dense Full-Body Human-Scene Contact , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  M. Andriluka,et al.  Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  M. Andriluka,et al.  Differentiable Dynamics for Articulated 3d Human Motion Reconstruction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Michael J. Black,et al.  ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bharat Lal Bhatnagar,et al.  CHORE: Contact, Human and Object REconstruction from a single RGB image , 2022, ECCV.

[11]  Kris Kitani,et al.  Occluded Human Mesh Recovery , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Limin Wang,et al.  Recovering 3D Human Mesh from Monocular Images: A Survey , 2022, ArXiv.

[13]  Jianyi Wang,et al.  SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos , 2021, ECCV.

[14]  J. Kautz,et al.  GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael J. Black,et al.  SPEC: Seeing People in the Wild with an Estimated Camera , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  S. Fidler,et al.  Physics-based Human Motion Estimation and Synthesis from Videos , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Ruben Villegas,et al.  Contact-Aware Retargeting of Skinned Motion , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Marc Pollefeys,et al.  Learning Motion Priors for 4D Human Body Capture in 3D Scenes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Takaaki Shiratori,et al.  FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[20]  Cristian Sminchisescu,et al.  AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  A. Torralba,et al.  Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Todd Murphey,et al.  Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Dimitrios Tzionas,et al.  Collaborative Regression of Expressive Bodies using Moderation , 2021, 2021 International Conference on 3D Vision (3DV).

[24]  Leonidas J. Guibas,et al.  HuMoR: 3D Human Motion Model for Robust Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Dongdong Yu,et al.  Body Meshes as Points , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christian Theobalt,et al.  Neural monocular 3D human motion capture with physical awareness , 2021, ACM Trans. Graph..

[27]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael J. Black,et al.  PARE: Part Attention Regressor for 3D Human Body Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Michael J. Black,et al.  On Self-Contact and Human Pose , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kris Kitani,et al.  SimPoE: Simulated Character Control for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bingbing Ni,et al.  Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhenan Sun,et al.  PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Francesc Moreno-Noguer,et al.  SMPLicit: Topology-aware Generative Model for Clothed People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Joachim Tesch,et al.  Populating 3D Scenes by Learning Human-Scene Interaction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kevin Lin,et al.  End-to-End Human Pose and Mesh Reconstruction with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Christian Theobalt,et al.  Monocular Real-time Full Body Capture with Inter-part Correlations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Serena Yeung,et al.  Holistic 3D Human and Scene Mesh Estimation from Single View Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Cewu Lu,et al.  HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  G. Sukhatme,et al.  NeuralSim: Augmenting Differentiable Simulators with Neural Networks , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Michael J. Black,et al.  Monocular, One-stage, Regression of Multiple 3D People , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Andreas Aristidou,et al.  MotioNet , 2020, ACM Trans. Graph..

[42]  Andrea Vedaldi,et al.  Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation , 2020, 2021 International Conference on 3D Vision (3DV).

[43]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Eduard Gabriel Bazavan,et al.  REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision , 2021, NeurIPS.

[45]  J. Hodgins,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[46]  Christian Theobalt,et al.  PhysCap , 2020, ACM Trans. Graph..

[47]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[48]  Kyoung Mu Lee,et al.  I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image , 2020, ECCV.

[49]  Deva Ramanan,et al.  Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild , 2020, ECCV.

[50]  Leonidas J. Guibas,et al.  Contact and Human Dynamics from Monocular Video , 2020, SCA.

[51]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yangang Wang,et al.  Object-Occluded Human Shape and Pose Estimation From a Single Color Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiaowei Zhou,et al.  Coherent Reconstruction of Multiple Humans From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Wanli Ouyang,et al.  3D Human Mesh Regression With Dense Correspondence , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yuta Nakashima,et al.  Yoga-82: A New Dataset for Fine-grained Classification of Human Poses , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[56]  Greg Turk,et al.  Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Cristian Sminchisescu,et al.  Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows , 2020, ECCV.

[58]  A. Tyagi,et al.  PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation , 2020, 2020 International Conference on 3D Vision (3DV).

[59]  Jimei Yang,et al.  Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  V. Lepetit,et al.  HOnnotate: A Method for 3D Annotation of Hand and Object Poses , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yanxi Liu,et al.  From Image to Stability: Learning Dynamics from Human Pose , 2020, ECCV.

[63]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Liu Wu,et al.  Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Nasser Kehtarnavaz,et al.  Deep Learning-based Human Pose Estimation: A Survey , 2020, ACM Comput. Surv..

[67]  Fan Zhang,et al.  MediaPipe: A Framework for Building Perception Pipelines , 2019, ArXiv.

[68]  Iasonas Kokkinos,et al.  HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Cordelia Schmid,et al.  Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Meng Wang,et al.  Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[74]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Kris M. Kitani,et al.  3D Ego-Pose Estimation via Imitation Learning , 2018, ECCV.

[78]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[79]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[80]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[81]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[82]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[85]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[86]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[87]  Eric Lengyel Volumetric Hierarchical Approximate Convex Decomposition , 2016 .

[88]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[89]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[90]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[91]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[95]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[96]  Heinrich H. Bülthoff,et al.  Perceived Object Stability Depends on Multisensory Estimates of Gravity , 2011, PloS one.

[97]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[98]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[99]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[100]  Odest Chadwicke Jenkins,et al.  Physical simulation for probabilistic motion tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  A. Hof The 'extrapolated center of mass' concept suggests a simple control of balance in walking. , 2008, Human movement science.

[102]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[103]  A. Hof The equations of motion for a standing human reveal three mechanisms for balance. , 2007, Journal of biomechanics.

[104]  David A. Forsyth,et al.  Knowing when to put your foot down , 2006, I3D '06.

[105]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[106]  A L Hof,et al.  The condition for dynamic stability. , 2005, Journal of biomechanics.

[107]  Yi-Chung Pai,et al.  Movement Termination and Stability in Standing , 2003, Exercise and sport sciences reviews.

[108]  Tsuhan Chen,et al.  Efficient feature extraction for 2D/3D objects in mesh representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[109]  Masanobu Yamamoto,et al.  Scene constraints-aided tracking of human body , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[110]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[111]  David A. Winter,et al.  Human balance and posture control during standing and walking , 1995 .

[112]  D. Winter A.B.C. (anatomy, biomechanics and control) of balance during standing and walking , 1995 .