Learning Navigation Skills for Legged Robots with Learned Robot Embeddings

Navigation policies are commonly learned on idealized cylinder agents in simulation, without modelling complex dynamics, like contact dynamics, arising from the interaction between the robot and the environment. Such policies perform poorly when deployed on complex and dynamic robots, such as legged robots. In this work, we learn hierarchical navigation policies that account for the low-level dynamics of legged robots, such as maximum speed, slipping, and achieve good performance at navigating cluttered indoor environments. Once such a policy is learned on one legged robot, it does not directly generalize to a different robot due to dynamical differences, which increases the cost of learning such a policy on new robots. To overcome this challenge, we learn dynamics-aware navigation policies across multiple robots with robot-specific embeddings, which enable generalization to new unseen robots. We train our policies across three legged robots - 2 quadrupeds (A1, AlienGo) and a hexapod (Daisy). At test time, we study the performance of our learned policy on two new legged robots (Laikago, 4-legged Daisy) and show that our learned policy can sample-efficiently generalize to previously unseen robots.

[1]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, ArXiv.

[2]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, IEEE Robotics and Automation Letters.

[3]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[4]  Wolfgang Hess,et al.  Real-time loop closure in 2D LIDAR SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[6]  Albert Wu,et al.  Experimental Evaluation of Deadbeat Running on the ATRIAS Biped , 2017, IEEE Robotics and Automation Letters.

[7]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[9]  Christopher G. Atkeson,et al.  Optimization‐based Full Body Control for the DARPA Robotics Challenge , 2015, J. Field Robotics.

[10]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[12]  Albert Wu,et al.  The 3-D Spring–Mass Model Reveals a Time-Based Deadbeat Control for Highly Robust Running and Steering in Uncertain Environments , 2013, IEEE Transactions on Robotics.

[13]  Yuandong Tian,et al.  Planning in Learned Latent Action Spaces for Generalizable Legged Locomotion , 2020, ArXiv.

[14]  Stefan Lee,et al.  Decentralized Distributed PPO: Solving PointGoal Navigation , 2019, ArXiv.

[15]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[16]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[17]  Sonia Chernova,et al.  Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance? , 2019, IEEE Robotics and Automation Letters.

[18]  Hartmut Geyer,et al.  Walking and Running with Passive Compliance: Lessons from Engineering: A Live Demonstration of the ATRIAS Biped , 2018, IEEE Robotics & Automation Magazine.

[19]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[20]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[21]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[23]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[24]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[25]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[26]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).