论文信息 - Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and $N$ -step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg

[1] Nils J. Nilsson,et al. APPLICATION OF INTELLIGENT AUTOMATA TO RECONNAISSANCE. , 1967 .

[2] Takeo Kanade,et al. Vision and Navigation for the Carnegie-Mellon Navlab , 1987 .

[3] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4] Hugh F. Durrant-Whyte,et al. Simultaneous map building and localization for an autonomous mobile robot , 1991, Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91.

[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[6] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[7] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[8] Mark R. Mine,et al. The Panda3D Graphics Engine , 2004, Computer.

[9] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[10] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[13] Edwin Olson,et al. Robust and efficient robotic mapping , 2008 .

[14] Yann LeCun,et al. Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[15] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[16] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[17] Vijay Kumar,et al. Autonomous multi-floor indoor navigation with a computationally constrained MAV , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18] José Ruíz Ascencio,et al. Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[19] Jan Peters,et al. Toward fast policy search for learning legged locomotion , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[24] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[25] Ying Zhang,et al. On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[26] Asier Mujika,et al. Multi-task learning with deep model based reinforcement learning , 2016, ArXiv.

[27] Charles Richter,et al. Safe Visual Navigation via Deep Learning and Novelty Detection , 2017, Robotics: Science and Systems.

[28] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[29] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.

[30] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[31] Ingmar Posner,et al. Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[33] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.