论文信息 - Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning

Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning

In this paper, we propose an end-to-end modular reinforcement learning architecture for a navigation task in complex dynamic environments with rapidly moving obstacles. In this architecture, the main task is divided into two subtasks: local obstacle avoidance and global navigation. For obstacle avoidance, we develop a two-stream Q-network, which processes spatial and temporal information separately and generates action values. The global navigation subtask is resolved by a conventional Q-network framework. An online learning network and an action scheduler are introduced to first combine two pretrained policies, and then continue exploring and optimizing until a stable policy is obtained. The two-stream Q-network obtains better performance than the conventional deep Q-learning approach in the obstacle avoidance subtask. Experiments on the main task demonstrate that the proposed architecture can efficiently avoid moving obstacles and complete the navigation task at a high success rate. The modular architecture enables parallel training and also demonstrates good generalization capability in different environments.

[1] Panos M. Pardalos,et al. Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[3] Sridhar Mahadevan,et al. Learning to Take Concurrent Actions , 2002, NIPS.

[4] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[6] Hugh F. Durrant-Whyte,et al. Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[7] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[10] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Ming Liu,et al. Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots , 2016, ArXiv.

[13] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[14] Roland Siegwart,et al. From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[17] Sebastian Thrun,et al. FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[18] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[19] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[20] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[21] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[23] Iwan Ulrich,et al. VFH+: reliable obstacle avoidance for fast mobile robots , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[24] Larry S. Davis,et al. Efficient Algorithms for Obstacle Detection Using Range Data , 1990, Comput. Vis. Graph. Image Process..

[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26] Ming Liu,et al. A deep-network solution towards model-less obstacle avoidance , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27] T. Barron,et al. Deep Reinforcement Learning in a 3-D Blockworld Environment , 2016 .

[28] John Thangarajah,et al. Integrating Skills and Simulation to Solve Complex Navigation Tasks in Infinite Mario , 2017, IEEE Transactions on Games.

[29] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[31] Moncef Gabbouj,et al. Real-Time Motor Fault Detection by 1-D Convolutional Neural Networks , 2016, IEEE Transactions on Industrial Electronics.

[32] M. Goodale,et al. Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[33] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[34] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[35] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[36] Yoram Koren,et al. The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[37] Marcus Gallagher,et al. Reinforcement Learning in First Person Shooter Games , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[38] Shuzhi Sam Ge,et al. Dynamic Motion Planning for Mobile Robots Using Potential Field Method , 2002, Auton. Robots.

[39] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[40] Huibert Kwakernaak,et al. Linear Optimal Control Systems , 1972 .

[41] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[42] Ming Liu,et al. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[44] Moncef Gabbouj,et al. Real-Time Patient-Specific ECG Classification by 1-D Convolutional Neural Networks , 2016, IEEE Transactions on Biomedical Engineering.

[45] Olivier Stasse,et al. MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[47] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .

[48] Hugh F. Durrant-Whyte,et al. A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[49] Simon Lacroix,et al. High resolution terrain mapping using low attitude aerial stereo imagery , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50] Ah-Hwee Tan,et al. Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[51] Zhang Bo,et al. Time-varying potential field based 'perception-action' behaviors of mobile robot , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[52] Hermann Ney,et al. Convolutional neural networks for acoustic modeling of raw time signal in LVCSR , 2015, INTERSPEECH.

[53] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[54] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[55] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.