RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

We describe a robotic learning system for autonomous navigation in diverse environments. At the core of our method are two components: (i) a non-parametric map that reflects the connectivity of the environment but does not require geometric reconstruction or localization, and (ii) a latent variable model of distances and actions for efficiently constructing and traversing this map. Trained on a large dataset of prior experience, this model predicts the expected amount of time needed to transit between the current image and a goal image, as well as the current action to take to do so. This model acquires a representation of goal images that is robust to distracting information. We demonstrate our method on a mobile ground robot in outdoor navigation scenarios. Given an image of a goal that is up to 80 meters away, our method learns to reach the goal in 20 minutes, and can reliably revisit these goals when faced with previously-unseen obstacles and weather conditions.

[1]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[2]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[3]  Brian Yamauchi,et al.  A frontier-based approach for autonomous exploration , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Alexei Makarenko,et al.  Information based adaptive robotic exploration , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[7]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[8]  Nicholas Roy,et al.  Efficient Optimization of Information-Theoretic Exploration in SLAM , 2008, AAAI.

[9]  Sven Behnke,et al.  A Comparative Evaluation of Exploration Strategies and Heuristics to Improve Them , 2011, ECMR.

[10]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[13]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[14]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[15]  Vijay Kumar,et al.  Information-theoretic mapping using Cauchy-Schwarz Quadratic Mutual Information , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Vijay Kumar,et al.  Information-Theoretic Planning with Trajectory Optimization for Dense 3D Mapping , 2015, Robotics: Science and Systems.

[17]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[18]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[19]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[20]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[21]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[24]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[25]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[28]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[29]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[30]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[32]  John K. Tsotsos,et al.  Revisiting active perception , 2016, Autonomous Robots.

[33]  Yoav Goldberg,et al.  Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[34]  Sam Devlin,et al.  Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck , 2019, NeurIPS.

[35]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[36]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[37]  Pieter Abbeel,et al.  Goal-conditioned Imitation Learning , 2019, NeurIPS.

[38]  Filipe Wall Mutz,et al.  Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.

[39]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[40]  Sham M. Kakade,et al.  Provably Efficient Maximum Entropy Exploration , 2018, ICML.

[41]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Sergey Levine,et al.  Contextual Imagined Goals for Self-Supervised Robotic Learning , 2019, CoRL.

[43]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[44]  Xiaotong Liu,et al.  Policy Continuation with Hindsight Inverse Dynamics , 2019, NeurIPS.

[45]  Sergey Levine,et al.  Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[46]  Nathan Michael,et al.  Real-Time Information-Theoretic Exploration with Gaussian Mixture Model Maps , 2019, Robotics: Science and Systems.

[47]  Learning Latent Plans from Play , 2019, CoRL.

[48]  Justin Bayer,et al.  Approximate Bayesian inference in spatial environments , 2018, Robotics: Science and Systems.

[49]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[50]  S. Levine,et al.  Learning To Reach Goals Without Reinforcement Learning , 2019, ArXiv.

[51]  Ruslan Salakhutdinov,et al.  Neural Topological SLAM for Visual Navigation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ari S. Morcos,et al.  DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames , 2019, ICLR.

[53]  Ruslan Salakhutdinov,et al.  Learning to Explore using Active Neural SLAM , 2020, ICLR.

[54]  Jimmy Ba,et al.  Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.

[55]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[56]  Sergey Levine,et al.  Deep Imitative Models for Flexible Inference, Planning, and Control , 2018, ICLR.

[57]  Akshay Krishnamurthy,et al.  Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning , 2019, ICML.

[58]  Abhinav Gupta,et al.  Semantic Curiosity for Active Visual Learning , 2020, ECCV.

[59]  Nathan D. Ratliff,et al.  Scaling Local Control to Large-Scale Topological Navigation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[60]  Pieter Abbeel,et al.  Hallucinative Topological Memory for Zero-Shot Visual Planning , 2020, ICML.

[61]  Shimon Whiteson,et al.  Maximizing Information Gain in Partially Observable Environments via Prediction Reward , 2020, AAMAS.

[62]  S. Levine,et al.  ViNG: Learning Open-World Navigation with Visual Goals , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Bradly C. Stadie,et al.  World Model as a Graph: Learning Latent Landmarks for Planning , 2020, ICML.

[64]  Pierre Sermanet,et al.  Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning , 2020, CoRL.

[65]  Sehoon Ha,et al.  Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[66]  Sergey Levine,et al.  C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[67]  S. Levine,et al.  BADGR: An Autonomous Self-Supervised Learning-Based Navigation System , 2020, IEEE Robotics and Automation Letters.