A Survey on Visual Navigation for Artificial Agents With Deep Reinforcement Learning

Visual navigation (vNavigation) is a key and fundamental technology for artificial agents’ interaction with the environment to achieve advanced behaviors. Visual navigation for artificial agents with deep reinforcement learning (DRL) is a new research hotspot in artificial intelligence and robotics that incorporates the decision making of DRL into visual navigation. Visual navigation via DRL, an end-to-end method, directly receives the high-dimensional images and generates an optimal navigation policy. In this paper, we first present an overview on reinforcement learning (RL), deep learning (DL) and deep reinforcement learning (DRL). Then, we systematically describe five main categories of visual DRL navigation: direct DRL vNavigation, hierarchical DRL vNavigation, multi-task DRL vNavigation, memory-inference DRL vNavigation and vision-language DRL vNavigation. These visual DRL navigation algorithms are reviewed in detail. Finally, we discuss the challenges and some possible opportunities to visual DRL navigation for artificial agents.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Yu Cheng,et al.  StoryGAN: A Sequential Conditional GAN for Story Visualization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[7]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[8]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[9]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[10]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[13]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[14]  Sanja Fidler,et al.  MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16]  Roger Wattenhofer,et al.  Teaching a Machine to Read Maps with Deep Reinforcement Learning , 2017, AAAI.

[17]  Yuxiang Sun,et al.  Probabilistic End-to-End Vehicle Navigation in Complex Dynamic Environments With Multimodal Sensor Fusion , 2020, IEEE Robotics and Automation Letters.

[18]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).

[19]  Siwei Ma,et al.  Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[21]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[22]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[23]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[26]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xinlei Chen,et al.  Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[30]  Yan Gan,et al.  Sentence guided object color change by adversarial learning , 2020, Neurocomputing.

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Eric P. Xing,et al.  Gated Path Planning Networks , 2018, ICML.

[34]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[35]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[36]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[39]  Jianfeng Gao,et al.  Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.

[40]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[41]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[42]  Shuzhi Sam Ge,et al.  New potential functions for mobile robot path planning , 2000, IEEE Trans. Robotics Autom..

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[44]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[45]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[46]  Philip H. S. Torr,et al.  Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.

[47]  Jason Baldridge,et al.  Transferable Representation Learning in Vision-and-Language Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Paolo Valigi,et al.  Deep Reinforcement Learning for Instruction Following Visual Navigation in 3D Maze-Like Environments , 2020, IEEE Robotics and Automation Letters.

[50]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[51]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[52]  Naoyuki Kubota,et al.  Episodic Memory Multimodal Learning for Robot Sensorimotor Map Building and Navigation , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[53]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[54]  N. Daw,et al.  Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.

[55]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[56]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[57]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[59]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[60]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[61]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[62]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[65]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[66]  Philip S. Thomas,et al.  Natural Option Critic , 2019, AAAI.

[67]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[68]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[69]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[70]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[71]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.

[72]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[73]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[74]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[75]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[76]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[77]  Shuzhi Sam Ge,et al.  Simultaneous Path Planning and Topological Mapping (SP2ATM) for environment exploration and goal oriented navigation , 2011, Robotics Auton. Syst..

[78]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[79]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[80]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[81]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[82]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[83]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[85]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[86]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[87]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[88]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[89]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[90]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[91]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[92]  Liam Paull,et al.  Deep Active Localization , 2019, IEEE Robotics and Automation Letters.

[93]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[94]  Yuan-Fang Wang,et al.  Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[96]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[97]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[99]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[100]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[101]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[102]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[103]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[104]  Abdullah Al Mamun,et al.  Boundary following and globally convergent path planning using instant goals , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[105]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[106]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[107]  A. Gershman,et al.  A Generalized Model for Multimodal Perception , 2017 .

[108]  Michael Milford,et al.  One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay , 2017, ArXiv.

[109]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[110]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[111]  Guoquan Huang,et al.  Visual-Inertial Navigation: A Concise Review , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[112]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[113]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[114]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[115]  Xin Ye,et al.  From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN) , 2020, ArXiv.

[116]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[117]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[118]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[119]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).