Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks

We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions. This challenge is central to a variety of fields, from graphics and animation to robotics and motor neuroscience. Our physics-based environment uses realistic actuation and first-person perception -- including touch sensors and egocentric vision -- with a view to producing active-sensing behaviors (e.g. gaze direction), transferability to real robots, and comparisons to the biology. We develop an integrated neural-network based approach consisting of a motor primitive module, human demonstrations, and an instructed reinforcement learning regime with curricula and task variations. We demonstrate the utility of our approach for several tasks, including goal-conditioned box carrying and ball catching, and we characterize its behavioral robustness. The resulting controllers can be deployed in real-time on a standard PC. See overview video, this https URL .

[1]  Jessica K. Hodgins,et al.  Animation of dynamic legged locomotion , 1991, SIGGRAPH.

[2]  Michiel van de Panne,et al.  Sensor-actuator networks , 1993, SIGGRAPH.

[3]  Demetri Terzopoulos,et al.  Animat vision: Active vision in artificial animals , 1995, Proceedings of IEEE International Conference on Computer Vision.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[6]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[7]  Rolf Pfeifer,et al.  Understanding intelligence , 1999 .

[8]  Andrew Y. Ng,et al.  Algorithms for Inverse Reinforcement Learning , 2000, ICML.

[9]  Petros Faloutsos,et al.  Composable controllers for physics-based character animation , 2001, SIGGRAPH.

[10]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[11]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[12]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  Maja J. Mataric,et al.  Automated derivation of behavior vocabularies for autonomous humanoid motion , 2003, AAMAS '03.

[14]  Stefan Schaal,et al.  Computational approaches to motor learning by imitation. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[15]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[16]  Scott L. Delp,et al.  A Model of the Upper Extremity for Simulating Musculoskeletal Surgery and Analyzing Neuromuscular Control , 2005, Annals of Biomedical Engineering.

[17]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007 .

[18]  Dana H. Ballard,et al.  Modeling embodied visual behaviors , 2007, TAP.

[19]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.

[20]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH Classes.

[21]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[22]  Stefan Schaal,et al.  Reinforcement learning of motor skills with policy gradients , 2008, Neural Networks.

[23]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[24]  Tianjia Shao,et al.  Sampling-based contact-rich motion control , 2010, SIGGRAPH 2010.

[25]  Philippe Beaudoin,et al.  Generalized biped walking control , 2010, SIGGRAPH 2010.

[26]  Sergey Levine,et al.  Space-time planning with parameterized locomotion controllers , 2011, TOGS.

[27]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Dinesh K. Pai,et al.  Eyecatch: simulating visuomotor coordination for object interception , 2012, ACM Trans. Graph..

[29]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Baining Guo,et al.  Terrain runner , 2012, ACM Trans. Graph..

[31]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[33]  Javier R. Movellan,et al.  STAC: Simultaneous tracking and calibration , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[34]  Aude Billard,et al.  Catching Objects in Flight , 2014, IEEE Transactions on Robotics.

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Norman I. Badler,et al.  A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception , 2015, Comput. Graph. Forum.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[39]  Bilge Mutlu,et al.  Authoring directed gaze for full-body motion capture , 2016, ACM Trans. Graph..

[40]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[41]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[42]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[43]  Kazuya Otani,et al.  Adaptive whole-body manipulation in human-to-humanoid multi-contact motion retargeting , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[44]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[45]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[46]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[47]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[49]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[50]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[51]  Stefan Schaal,et al.  Learning by Demonstration , 1996, Encyclopedia of Machine Learning and Data Mining.

[52]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[53]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[54]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[55]  Sergey Levine,et al.  DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills , 2018, ACM Trans. Graph..

[56]  Stefan Jeschke,et al.  Physics-based motion capture imitation with deep reinforcement learning , 2018, MIG.

[57]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[58]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[59]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[60]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[61]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[62]  OpenAI Learning Dexterous In-Hand Manipulation , 2018 .

[63]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[64]  Peter Englert,et al.  Learning manipulation skills from a single demonstration , 2018, Int. J. Robotics Res..

[65]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[66]  Tao Zhou,et al.  Deep learning of biomimetic sensorimotor control for biomechanical human animation , 2018, ACM Trans. Graph..

[67]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[68]  J. Forbes,et al.  DReCon: data-driven responsive control of physics-based characters , 2019, ACM Trans. Graph..

[69]  A. Frank van der Stappen,et al.  Data-driven Gaze Animation using Recurrent Neural Networks , 2019, MIG.

[70]  Joonho Lee,et al.  Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning , 2019, ArXiv.

[71]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects with Residual Physics , 2019, Robotics: Science and Systems.

[72]  Sebastian Starke,et al.  Neural state machine for character-scene interactions , 2019, ACM Trans. Graph..

[73]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[74]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[75]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[76]  Greg Wayne,et al.  Hierarchical motor control in mammals and machines , 2019, Nature Communications.

[77]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[78]  Learning Latent Plans from Play , 2019, CoRL.

[79]  Jonathan W. Hurst,et al.  Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie , 2019, ArXiv.

[80]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[81]  NohJunyong,et al.  Model Predictive Control with a Visuomotor System for Physics-based Character Animation , 2020 .

[82]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[83]  H. Francis Song,et al.  V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2020, ICLR.

[84]  Weifeng Chen,et al.  Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control , 2019, ArXiv.