Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and $N$ -step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg

[1]  Nils J. Nilsson,et al.  APPLICATION OF INTELLIGENT AUTOMATA TO RECONNAISSANCE. , 1967 .

[2]  Takeo Kanade,et al.  Vision and Navigation for the Carnegie-Mellon Navlab , 1987 .

[3]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4]  Hugh F. Durrant-Whyte,et al.  Simultaneous map building and localization for an autonomous mobile robot , 1991, Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91.

[5]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[6]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[7]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[8]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.

[9]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[10]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[13]  Edwin Olson,et al.  Robust and efficient robotic mapping , 2008 .

[14]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[15]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[16]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17]  Vijay Kumar,et al.  Autonomous multi-floor indoor navigation with a computationally constrained MAV , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[19]  Jan Peters,et al.  Toward fast policy search for learning legged locomotion , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[23]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[24]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[25]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[26]  Asier Mujika,et al.  Multi-task learning with deep model based reinforcement learning , 2016, ArXiv.

[27]  Charles Richter,et al.  Safe Visual Navigation via Deep Learning and Novelty Detection , 2017, Robotics: Science and Systems.

[28]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[29]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[30]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[31]  Ingmar Posner,et al.  Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[33]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.