Contact and Human Dynamics from Monocular Video

Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles. In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. We first estimate ground contact timings with a novel prediction network which is trained without hand-labeled data. A physics-based trajectory optimization then solves for a physically-plausible motion, based on the inputs. We show this process produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematic and dynamic plausibility. We demonstrate our method on character animation and pose estimation tasks on dynamic motions of dancing and sports with complex contact patterns.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jimei Yang,et al.  Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[4]  Song-Chun Zhu,et al.  Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  C. Karen Liu,et al.  Synthesis of biologically realistic human motion using joint torque actuation , 2019, ACM Trans. Graph..

[7]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nicolas Mansard,et al.  Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Nicolas Mansard,et al.  Multicontact Locomotion of Legged Robots , 2018, IEEE Transactions on Robotics.

[14]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[15]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Marco Hutter,et al.  Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization , 2018, IEEE Robotics and Automation Letters.

[17]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[19]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[20]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[22]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Russ Tedrake,et al.  Whole-body motion planning with centroidal dynamics and full kinematics , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[25]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  David E. Orin,et al.  Centroidal dynamics of a humanoid robot , 2013, Auton. Robots.

[27]  Ludovic Hoyet,et al.  Push it real , 2012, ACM Trans. Graph..

[28]  Jessica K. Hodgins,et al.  Video-based 3D motion capture through biped control , 2012, ACM Trans. Graph..

[29]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[30]  Kornelia Kulig,et al.  Ground reaction forces and knee mechanics in the weight acceptance phase of a dance leap take-off and landing , 2011, Journal of sports sciences.

[31]  Martin de Lasa,et al.  Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[32]  Xiaolin K. Wei,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[33]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[34]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[35]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Victor B. Zordan,et al.  Momentum control for balance , 2009, ACM Trans. Graph..

[37]  Marko B. Popovic,et al.  Angular momentum in human walking , 2008, Journal of Experimental Biology.

[38]  Ronan Boulic,et al.  Robust kinematic constraint detection for motion data , 2006, SCA '06.

[39]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[40]  James F. O'Brien,et al.  Computational Studies of Human Motion , 2006 .

[41]  David A. Forsyth,et al.  Knowing when to put your foot down , 2006, I3D '06.

[42]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[43]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[44]  D. Gordon E. Robertson,et al.  Research Methods in Biomechanics , 2004 .

[45]  S PollardNancy,et al.  Perceptual metrics for character animation , 2003 .

[46]  Nancy S. Pollard,et al.  Efficient synthesis of physically valid human motion , 2003, ACM Trans. Graph..

[47]  Nancy S. Pollard,et al.  Perceptual metrics for character animation: sensitivity to errors in ballistic motion , 2003, ACM Trans. Graph..

[48]  Kwang-Jin Choi,et al.  On-line motion retargetting , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[49]  Zoran Popovic,et al.  Physically based motion transformation , 1999, SIGGRAPH.

[50]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[51]  P. Leva Adjustments to Zatsiorsky-Seluyanov's segment inertia parameters. , 1996 .