论文信息 - SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured"in the wild"video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.

[1] S. Levine,et al. A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning , 2022, ArXiv.

[2] Flavio De Vincenti,et al. Animal Motions on Legged Robots Using Nonlinear Model Predictive Control , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Zipeng Fu,et al. Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion , 2022, CoRL.

[4] S. Levine,et al. GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots , 2022, CoRL.

[5] Aaron M. Johnson,et al. Hybrid iLQR Model Predictive Control for Contact Implicit Stabilization on Legged Robots , 2022, IEEE Transactions on Robotics.

[6] Shikhar Bahl,et al. Human-to-Robot Imitation in the Wild , 2022, Robotics: Science and Systems.

[7] Simon Le Cleac'h,et al. CALIPSO: A Differentiable Solver for Trajectory Optimization with Conic and Complementarity Constraints , 2022, ISRR.

[8] Pulkit Agrawal,et al. Rapid Locomotion via Reinforcement Learning , 2022, Robotics: Science and Systems.

[9] Sehoon Ha,et al. Human Motion Control of Quadrupedal Robots using Deep Reinforcement Learning , 2022, Robotics: Science and Systems.

[10] Chuong H. Nguyen,et al. Continuous Jumping for Legged Robots on Stepping Stones via Trajectory Optimization and Model Predictive Control , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).

[11] S. Hutchinson,et al. Momentum-Aware Trajectory Optimization and Control for Agile Quadrupedal Locomotion , 2022, IEEE Robotics and Automation Letters.

[12] Simon Le Cleac'h,et al. Dojo: A Differentiable Physics Engine for Robotics , 2022, 2203.00806.

[13] Qifeng Zhang,et al. Imitation and Adaptation Based on Consistency: A Quadruped Robot Imitates Animals from Videos Using Deep Reinforcement Learning , 2022, 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[14] Deepak Pathak,et al. Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube , 2022, Robotics: Science and Systems.

[15] Lan Xu,et al. Artemis: Articulated Neural Pets with Appearance and Motion synthesis , 2022, ACM Trans. Graph..

[16] A. Vedaldi,et al. BANMo: Building Animatable 3D Neural Models from Many Casual Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Jae Shin Yoon,et al. HUMBI: A Large Multiview Dataset of Human Body Expressions and Benchmark Challenge , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Sehoon Ha,et al. FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion , 2021, Robotics.

[19] Stelian Coros,et al. Animal Gaits on Quadrupedal Robots Using Motion Matching and Model-Based Control , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] A. Vedaldi,et al. DOVE: Learning Deformable 3D Objects by Watching Videos , 2021, International Journal of Computer Vision.

[21] J.-Y. Zhu,et al. Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[22] Simon Le Cleac'h,et al. Fast Contact-Implicit Model Predictive Control , 2021, IEEE Transactions on Robotics.

[23] Jitendra Malik,et al. RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[24] Andrea Vedaldi,et al. Discovering Relationships between Object Categories via Universal Canonical Maps , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Sangbae Kim,et al. Online Trajectory Optimization for Dynamic Aerial Motions of a Quadruped Robot , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[26] Ye Zhao,et al. Mediating Between Contact Feasibility and Robustness of Trajectory Optimization Through Chance Complementarity Constraints , 2021, Frontiers in Robotics and AI.

[27] Varun Jampani,et al. LASR: Learning Articulated Shape Reconstruction from a Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Deva Ramanan,et al. Learning to Segment Rigid Motions from Two Frames , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Hujun Bao,et al. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Andreas Geiger,et al. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Andrea Vedaldi,et al. Continuous Surface Embeddings , 2020, NeurIPS.

[32] Ye Zhao,et al. Robust Trajectory Optimization Over Uncertain Terrain With Stochastic Complementarity , 2020, IEEE Robotics and Automation Letters.

[33] S. Levine,et al. Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[34] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[35] Seng Bum Michael Yoo,et al. OpenMonkeyStudio: Automated Markerless Pose Estimation in Freely Moving Macaques , 2020, bioRxiv.

[36] Ross B. Girshick,et al. PointRend: Image Segmentation As Rendering , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Michael J. Black,et al. VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Hongdong Li,et al. Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Donghyun Kim,et al. Highly Dynamic Quadruped Locomotion via Whole-Body Impulse Control and Model Predictive Control , 2019, ArXiv.

[40] S. Vijayakumar,et al. Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41] Neel Doshi,et al. Contact-implicit trajectory optimization using variational integrators , 2019, Int. J. Robotics Res..

[42] Sangbae Kim,et al. Mini Cheetah: A Platform for Pushing the Limits of Dynamic Quadruped Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[43] Yaser Sheikh,et al. Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jitendra Malik,et al. Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Sangbae Kim,et al. MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46] Lorenz T. Biegler,et al. Contact-Implicit Trajectory Optimization Using Orthogonal Collocation , 2018, IEEE Robotics and Automation Letters.

[47] Kevin M. Cury,et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[48] Taku Komura,et al. Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[49] Taylor Apgar,et al. Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[50] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[51] Glen Berseth,et al. Feedback Control For Cassie With Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[52] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[53] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Matthew Kelly,et al. An Introduction to Trajectory Optimization: How to Do Your Own Direct Collocation , 2017, SIAM Rev..

[55] Eiichi Yoshida,et al. Motion Retargeting for Humanoid Robots Based on Simultaneous Morphing Parameter Identification and Motion Optimization , 2017, IEEE Transactions on Robotics.

[56] Deqing Sun,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57] Ruben Grandia,et al. Hybrid direct collocation and control in the constraint-consistent subspace for dynamic legged robot locomotion , 2017, Robotics: Science and Systems.

[58] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[59] Takeo Kanade,et al. Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[61] S. Kuindersma,et al. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2016, Auton. Robots.

[62] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[63] Aaron D. Ames,et al. Model predictive control of underactuated bipedal robotic walking , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[64] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[65] Maren Bennewitz,et al. Real-time imitation of human whole-body motions by humanoids , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[66] Russ Tedrake,et al. A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[67] Vijay Kumar,et al. Minimum snap trajectory generation and control for quadrotors , 2011, 2011 IEEE International Conference on Robotics and Automation.

[68] Stefan Schaal,et al. Inverse dynamics control of floating base systems using orthogonal decomposition , 2010, 2010 IEEE International Conference on Robotics and Automation.

[69] Atsushi Nakazawa,et al. Learning from Observation Paradigm: Leg Task Models for Enabling a Biped Humanoid Robot to Imitate Human Dances , 2007, Int. J. Robotics Res..

[70] Kazuhiro Kosuge,et al. Dance Step Estimation Method Based on HMM for Dance Partner Robot , 2007, IEEE Transactions on Industrial Electronics.

[71] Sang Il Park,et al. Capturing and animating skin deformation in human motion , 2006, ACM Trans. Graph..

[72] Joachim Weickert,et al. Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods , 2005, International Journal of Computer Vision.

[73] Christopher G. Atkeson,et al. Adapting human motion for the control of a humanoid robot , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[74] D. Stewart,et al. AN IMPLICIT TIME-STEPPING SCHEME FOR RIGID BODY DYNAMICS WITH INELASTIC COLLISIONS AND COULOMB FRICTION , 1996 .

[75] David E. Orin,et al. Simulation of contact using a nonlinear damping model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[76] IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022 , 2022, CVPR.

[77] Deva Ramanan,et al. Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.