Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots

Deep reinforcement learning algorithms require large and diverse datasets in order to learn successful policies for perception-based mobile navigation. However, gathering such datasets with a single robot can be prohibitively expensive. Collecting data with multiple different robotic platforms with possibly different dynamics is a more scalable approach to large-scale data collection. But how can deep reinforcement learning algorithms leverage such heterogeneous datasets? In this work, we propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt). At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. This method of planning with hierarchically integrated models allows the algorithm to train on datasets gathered by a variety of different platforms, while respecting the physical capabilities of the deployment robot at test time. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.

[1]  Dieter Fox,et al.  Neural Autonomous Navigation with Riemannian Motion Policy , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[3]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[4]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[5]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[6]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[7]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Michael Milford,et al.  Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies , 2016, ICRA 2017.

[9]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[10]  Dieter Fox,et al.  Scaling Local Control to Large-Scale Topological Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[12]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[13]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[14]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[15]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Evangelos Theodorou,et al.  Model Predictive Path Integral Control using Covariance Variable Importance Sampling , 2015, ArXiv.

[18]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Martin A. Riedmiller,et al.  Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.

[20]  Wei Gao,et al.  Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation , 2017, CoRL.

[21]  Pieter Abbeel,et al.  BADGR: An Autonomous Self-Supervised Learning-Based Navigation System , 2020, ArXiv.

[22]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[23]  Pieter Abbeel,et al.  Hierarchically Decoupled Imitation for Morphological Transfer , 2020, ICML.

[24]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[25]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[29]  Vladlen Koltun,et al.  Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[30]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[31]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[32]  Carlos R. del-Blanco,et al.  DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[33]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.

[34]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[35]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[36]  Byron Boots,et al.  Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.