Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning

In this paper, we propose an end-to-end modular reinforcement learning architecture for a navigation task in complex dynamic environments with rapidly moving obstacles. In this architecture, the main task is divided into two subtasks: local obstacle avoidance and global navigation. For obstacle avoidance, we develop a two-stream Q-network, which processes spatial and temporal information separately and generates action values. The global navigation subtask is resolved by a conventional Q-network framework. An online learning network and an action scheduler are introduced to first combine two pretrained policies, and then continue exploring and optimizing until a stable policy is obtained. The two-stream Q-network obtains better performance than the conventional deep Q-learning approach in the obstacle avoidance subtask. Experiments on the main task demonstrate that the proposed architecture can efficiently avoid moving obstacles and complete the navigation task at a high success rate. The modular architecture enables parallel training and also demonstrates good generalization capability in different environments.

[1]  Panos M. Pardalos,et al.  Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[6]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[7]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Ming Liu,et al.  Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots , 2016, ArXiv.

[13]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[14]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[17]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[18]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[19]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[21]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[23]  Iwan Ulrich,et al.  VFH+: reliable obstacle avoidance for fast mobile robots , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[24]  Larry S. Davis,et al.  Efficient Algorithms for Obstacle Detection Using Range Data , 1990, Comput. Vis. Graph. Image Process..

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Ming Liu,et al.  A deep-network solution towards model-less obstacle avoidance , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  T. Barron,et al.  Deep Reinforcement Learning in a 3-D Blockworld Environment , 2016 .

[28]  John Thangarajah,et al.  Integrating Skills and Simulation to Solve Complex Navigation Tasks in Infinite Mario , 2017, IEEE Transactions on Games.

[29]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[31]  Moncef Gabbouj,et al.  Real-Time Motor Fault Detection by 1-D Convolutional Neural Networks , 2016, IEEE Transactions on Industrial Electronics.

[32]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[33]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[34]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[35]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[36]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[37]  Marcus Gallagher,et al.  Reinforcement Learning in First Person Shooter Games , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[38]  Shuzhi Sam Ge,et al.  Dynamic Motion Planning for Mobile Robots Using Potential Field Method , 2002, Auton. Robots.

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  Huibert Kwakernaak,et al.  Linear Optimal Control Systems , 1972 .

[41]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[42]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[44]  Moncef Gabbouj,et al.  Real-Time Patient-Specific ECG Classification by 1-D Convolutional Neural Networks , 2016, IEEE Transactions on Biomedical Engineering.

[45]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[47]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[48]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[49]  Simon Lacroix,et al.  High resolution terrain mapping using low attitude aerial stereo imagery , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50]  Ah-Hwee Tan,et al.  Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[51]  Zhang Bo,et al.  Time-varying potential field based 'perception-action' behaviors of mobile robot , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[52]  Hermann Ney,et al.  Convolutional neural networks for acoustic modeling of raw time signal in LVCSR , 2015, INTERSPEECH.

[53]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[54]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[55]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.