SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured"in the wild"video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

[1]  S. Levine,et al.  A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning , 2022, ArXiv.

[2]  Flavio De Vincenti,et al.  Animal Motions on Legged Robots Using Nonlinear Model Predictive Control , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Zipeng Fu,et al.  Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion , 2022, CoRL.

[4]  S. Levine,et al.  GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots , 2022, CoRL.

[5]  Aaron M. Johnson,et al.  Hybrid iLQR Model Predictive Control for Contact Implicit Stabilization on Legged Robots , 2022, IEEE Transactions on Robotics.

[6]  Shikhar Bahl,et al.  Human-to-Robot Imitation in the Wild , 2022, Robotics: Science and Systems.

[7]  Simon Le Cleac'h,et al.  CALIPSO: A Differentiable Solver for Trajectory Optimization with Conic and Complementarity Constraints , 2022, ISRR.

[8]  Pulkit Agrawal,et al.  Rapid Locomotion via Reinforcement Learning , 2022, Robotics: Science and Systems.

[9]  Sehoon Ha,et al.  Human Motion Control of Quadrupedal Robots using Deep Reinforcement Learning , 2022, Robotics: Science and Systems.

[10]  Chuong H. Nguyen,et al.  Continuous Jumping for Legged Robots on Stepping Stones via Trajectory Optimization and Model Predictive Control , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).

[11]  S. Hutchinson,et al.  Momentum-Aware Trajectory Optimization and Control for Agile Quadrupedal Locomotion , 2022, IEEE Robotics and Automation Letters.

[12]  Simon Le Cleac'h,et al.  Dojo: A Differentiable Physics Engine for Robotics , 2022, 2203.00806.

[13]  Qifeng Zhang,et al.  Imitation and Adaptation Based on Consistency: A Quadruped Robot Imitates Animals from Videos Using Deep Reinforcement Learning , 2022, 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[14]  Deepak Pathak,et al.  Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube , 2022, Robotics: Science and Systems.

[15]  Lan Xu,et al.  Artemis: Articulated Neural Pets with Appearance and Motion synthesis , 2022, ACM Trans. Graph..

[16]  A. Vedaldi,et al.  BANMo: Building Animatable 3D Neural Models from Many Casual Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jae Shin Yoon,et al.  HUMBI: A Large Multiview Dataset of Human Body Expressions and Benchmark Challenge , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Sehoon Ha,et al.  FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion , 2021, Robotics.

[19]  Stelian Coros,et al.  Animal Gaits on Quadrupedal Robots Using Motion Matching and Model-Based Control , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  A. Vedaldi,et al.  DOVE: Learning Deformable 3D Objects by Watching Videos , 2021, International Journal of Computer Vision.

[21]  J.-Y. Zhu,et al.  Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[22]  Simon Le Cleac'h,et al.  Fast Contact-Implicit Model Predictive Control , 2021, IEEE Transactions on Robotics.

[23]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[24]  Andrea Vedaldi,et al.  Discovering Relationships between Object Categories via Universal Canonical Maps , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sangbae Kim,et al.  Online Trajectory Optimization for Dynamic Aerial Motions of a Quadruped Robot , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Ye Zhao,et al.  Mediating Between Contact Feasibility and Robustness of Trajectory Optimization Through Chance Complementarity Constraints , 2021, Frontiers in Robotics and AI.

[27]  Varun Jampani,et al.  LASR: Learning Articulated Shape Reconstruction from a Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Deva Ramanan,et al.  Learning to Segment Rigid Motions from Two Frames , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrea Vedaldi,et al.  Continuous Surface Embeddings , 2020, NeurIPS.

[32]  Ye Zhao,et al.  Robust Trajectory Optimization Over Uncertain Terrain With Stochastic Complementarity , 2020, IEEE Robotics and Automation Letters.

[33]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[34]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[35]  Seng Bum Michael Yoo,et al.  OpenMonkeyStudio: Automated Markerless Pose Estimation in Freely Moving Macaques , 2020, bioRxiv.

[36]  Ross B. Girshick,et al.  PointRend: Image Segmentation As Rendering , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hongdong Li,et al.  Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Donghyun Kim,et al.  Highly Dynamic Quadruped Locomotion via Whole-Body Impulse Control and Model Predictive Control , 2019, ArXiv.

[40]  S. Vijayakumar,et al.  Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Neel Doshi,et al.  Contact-implicit trajectory optimization using variational integrators , 2019, Int. J. Robotics Res..

[42]  Sangbae Kim,et al.  Mini Cheetah: A Platform for Pushing the Limits of Dynamic Quadruped Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sangbae Kim,et al.  MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Lorenz T. Biegler,et al.  Contact-Implicit Trajectory Optimization Using Orthogonal Collocation , 2018, IEEE Robotics and Automation Letters.

[47]  Kevin M. Cury,et al.  DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[48]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[49]  Taylor Apgar,et al.  Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[50]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[51]  Glen Berseth,et al.  Feedback Control For Cassie With Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[52]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[53]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Matthew Kelly,et al.  An Introduction to Trajectory Optimization: How to Do Your Own Direct Collocation , 2017, SIAM Rev..

[55]  Eiichi Yoshida,et al.  Motion Retargeting for Humanoid Robots Based on Simultaneous Morphing Parameter Identification and Motion Optimization , 2017, IEEE Transactions on Robotics.

[56]  Deqing Sun,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Ruben Grandia,et al.  Hybrid direct collocation and control in the constraint-consistent subspace for dynamic legged robot locomotion , 2017, Robotics: Science and Systems.

[58]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[59]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[61]  S. Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2016, Auton. Robots.

[62]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[63]  Aaron D. Ames,et al.  Model predictive control of underactuated bipedal robotic walking , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[65]  Maren Bennewitz,et al.  Real-time imitation of human whole-body motions by humanoids , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[66]  Russ Tedrake,et al.  A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[67]  Vijay Kumar,et al.  Minimum snap trajectory generation and control for quadrotors , 2011, 2011 IEEE International Conference on Robotics and Automation.

[68]  Stefan Schaal,et al.  Inverse dynamics control of floating base systems using orthogonal decomposition , 2010, 2010 IEEE International Conference on Robotics and Automation.

[69]  Atsushi Nakazawa,et al.  Learning from Observation Paradigm: Leg Task Models for Enabling a Biped Humanoid Robot to Imitate Human Dances , 2007, Int. J. Robotics Res..

[70]  Kazuhiro Kosuge,et al.  Dance Step Estimation Method Based on HMM for Dance Partner Robot , 2007, IEEE Transactions on Industrial Electronics.

[71]  Sang Il Park,et al.  Capturing and animating skin deformation in human motion , 2006, ACM Trans. Graph..

[72]  Joachim Weickert,et al.  Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods , 2005, International Journal of Computer Vision.

[73]  Christopher G. Atkeson,et al.  Adapting human motion for the control of a humanoid robot , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[74]  D. Stewart,et al.  AN IMPLICIT TIME-STEPPING SCHEME FOR RIGID BODY DYNAMICS WITH INELASTIC COLLISIONS AND COULOMB FRICTION , 1996 .

[75]  David E. Orin,et al.  Simulation of contact using a nonlinear damping model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[76]  IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022 , 2022, CVPR.

[77]  Deva Ramanan,et al.  Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.