Contact and Human Dynamics from Monocular Video

Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles. In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. We first estimate ground contact timings with a novel prediction network which is trained without hand-labeled data. A physics-based trajectory optimization then solves for a physically-plausible motion, based on the inputs. We show this process produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematic and dynamic plausibility. We demonstrate our method on character animation and pose estimation tasks on dynamic motions of dancing and sports with complex contact patterns.

[1]  Xiaolin K. Wei,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[2]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[3]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[8]  Kornelia Kulig,et al.  Ground reaction forces and knee mechanics in the weight acceptance phase of a dance leap take-off and landing , 2011, Journal of sports sciences.

[9]  Marco Hutter,et al.  Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization , 2018, IEEE Robotics and Automation Letters.

[10]  Song-Chun Zhu,et al.  Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Victor B. Zordan,et al.  Momentum control for balance , 2009, ACM Trans. Graph..

[12]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[13]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[14]  Russ Tedrake,et al.  Whole-body motion planning with centroidal dynamics and full kinematics , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[15]  Marko B. Popovic,et al.  Angular momentum in human walking , 2008, Journal of Experimental Biology.

[16]  Nicolas Mansard,et al.  Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nancy S. Pollard,et al.  Perceptual metrics for character animation: sensitivity to errors in ballistic motion , 2003, ACM Trans. Graph..

[19]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[21]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[22]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kwang-Jin Choi,et al.  On-line motion retargetting , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[24]  S PollardNancy,et al.  Efficient synthesis of physically valid human motion , 2003 .

[25]  Ronan Boulic,et al.  Robust kinematic constraint detection for motion data , 2006, SCA '06.

[26]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[27]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[28]  C. Karen Liu,et al.  Synthesis of biologically realistic human motion using joint torque actuation , 2019, ACM Trans. Graph..

[29]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Zoran Popovic,et al.  Physically based motion transformation , 1999, SIGGRAPH.

[31]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[32]  S PollardNancy,et al.  Perceptual metrics for character animation , 2003 .

[33]  David A. Forsyth,et al.  Knowing when to put your foot down , 2006, I3D '06.

[34]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[35]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ludovic Hoyet,et al.  Push it real , 2012, ACM Trans. Graph..

[39]  Nicolas Mansard,et al.  Multicontact Locomotion of Legged Robots , 2018, IEEE Transactions on Robotics.

[40]  Nancy S. Pollard,et al.  Efficient synthesis of physically valid human motion , 2003, ACM Trans. Graph..

[41]  David E. Orin,et al.  Centroidal dynamics of a humanoid robot , 2013, Auton. Robots.

[42]  D. Gordon E. Robertson,et al.  Research Methods in Biomechanics , 2004 .

[43]  Jessica K. Hodgins,et al.  Video-based 3D motion capture through biped control , 2012, ACM Trans. Graph..

[44]  Martin de Lasa,et al.  Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[45]  Jimei Yang,et al.  Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[47]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[48]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[49]  P. Leva Adjustments to Zatsiorsky-Seluyanov's segment inertia parameters. , 1996 .

[50]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[52]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.