Time-sequence Action-Decision and Navigation Through Stage Deep Reinforcement Learning in Complex Dynamic Environments

Navigation in a complex dynamic environment is one of the most attractive tasks. Although most of such algorithms can achieve navigation tasks effectively, they ignore the necessity of the mission planning in the process of navigation. Given the situation, a novel end-to-end two-stage deep reinforcement learning architecture for a time-sequence navigation and action-decision in a dynamic environment with randomly rapidly moving obstacles is proposed in this paper. During the first-stage training, a network with spatial and temporal information is designed to process the navigation task while a conventional recurrent full-connected network is adopted to resolve the action-decision task. During the second-stage training, the two networks are integrated and trained online with dynamic entropy to obtain a stable policy for dynamic missions. Simulations demonstrate that the navigation and the action-decision in different environments can be completed effectively under our architecture.

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[3]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[4]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Larry S. Davis,et al.  Efficient Algorithms for Obstacle Detection Using Range Data , 1990, Comput. Vis. Graph. Image Process..

[8]  Ngo Anh Vien,et al.  A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes , 2018, IEEE Access.

[9]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[10]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[11]  Shuzhi Sam Ge,et al.  Dynamic Motion Planning for Mobile Robots Using Potential Field Method , 2002, Auton. Robots.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[14]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[15]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[16]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.