Target-driven visual navigation in indoor scenes using deep reinforcement learning

Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization. To address the second issue, we propose the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.

[1]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[2]  G. Oriolo,et al.  On-line map building and navigation for autonomous mobile robots , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[3]  Simon Lacroix,et al.  Reactive navigation in outdoor environments using potential fields , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[4]  Ramakant Nevatia,et al.  Symbolic Navigation with a Generic Map , 1999, Auton. Robots.

[5]  Yoshiaki Shirai,et al.  Autonomous visual navigation of a mobile robot using a human-guided experience , 2002, Robotics Auton. Syst..

[6]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[8]  Manuela M. Veloso,et al.  Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[9]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10]  Patrick Gros,et al.  Robot motion control from a visual memory , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Michel Dhome,et al.  Outdoor autonomous navigation using monocular vision , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[13]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Masahiro Tomono,et al.  3-D Object Map Building Using Dense Object Models with SIFT-based Recognition Features , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  David Wooden,et al.  A guide to vision-based map building , 2006, IEEE Robotics & Automation Magazine.

[16]  James J. Little,et al.  Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Parvaneh Saeedi,et al.  Vision-based 3-D trajectory tracking for unknown environments , 2006, IEEE Transactions on Robotics.

[18]  Francisco Bonin-Font,et al.  Visual Navigation for Mobile Robots: A Survey , 2008, J. Intell. Robotic Syst..

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[21]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[22]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[23]  Peter I. Corke,et al.  Vision-only autonomous navigation using topometric maps , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator, v1.3.5 , 2013 .

[25]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[26]  M. Buchanan What happens if...? , 2013, Nature Physics.

[27]  Daniel E. Koditschek,et al.  Active sensing for dynamic, non-holonomic, robust visual servoing , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Paul Newman,et al.  Scene Signatures: Localised and Point-less Features for Localisation , 2014, Robotics: Science and Systems.

[29]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[30]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[33]  Maya Cakmak,et al.  Autonomous question answering with mobile robots in human-populated environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[35]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Marlos C. Machado,et al.  State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[38]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[40]  André da Motta Salles Barreto,et al.  Practical Kernel-Based Reinforcement Learning , 2014, J. Mach. Learn. Res..

[41]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[42]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[43]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Paul Newman,et al.  Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[46]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[47]  Roberto Cipolla,et al.  SceneNet: An annotated model generator for indoor scene understanding , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[49]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[50]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[51]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[53]  Kostas Daniilidis,et al.  Fast, robust, continuous monocular egomotion computation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Koren,et al.  Real-Time Obstacle Avoidance for Fast Mobile Robots , 2022 .