论文信息 - POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections

POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections

When applying autonomous driving technology to real-world scenarios, environmental uncertainties make the development of decision-making algorithms difficult. Modeling the problem as a Partially Observable Markov Decision Process (POMDP) [1] allows the algorithm to consider these uncertainties in the decision process, which makes it more robust to real sensor characteristics. However, solving the POMDP with reinforcement learning (RL) [2] often requires storing a large number of observations. Furthermore, for continuous action spaces, the system is computationally inefficient. This paper addresses these problems by proposing to model the problem as an MDP and learn a policy with RL using hierarchical options (HOMDP). The suggested algorithm can store the state-action pairs and only uses current observations to solve a POMDP problem. We compare the results of to the time-to-collision method [3] and the proposed POMDP-with-LSTM method. Our results show that the HOMDP approach is able to improve the performance of the agent for a four-way intersection task with two-way stop signs. The HOMDP method can generate both higher-level discrete options and lower-level continuous actions with only the observations of the current step.

[1] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[2] Weilong Song,et al. Intention-Aware Autonomous Driving Decision-Making in an Uncontrolled Intersection , 2016 .

[3] Pascal Poupart,et al. On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[4] Maxim Egorov,et al. Deep Reinforcement Learning with POMDPs , 2015 .

[5] David Isele,et al. Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[6] David Isele,et al. Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7] Shimon Whiteson,et al. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[8] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[9] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[10] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[11] Xiao Lin,et al. Research on car-following model based on SUMO , 2014, The 7th IEEE/International Conference on Advanced Infocomm Technology.

[12] David Isele,et al. Transferring Autonomous Driving Knowledge on Simulated and Real Intersections , 2017, ArXiv.

[13] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[14] John M. Dolan,et al. Intention estimation for ramp merging control in autonomous driving , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[15] David N. Lee,et al. A Theory of Visual Control of Braking Based on Information about Time-to-Collision , 1976, Perception.

[16] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[19] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[20] Daniel Krajzewicz,et al. Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[21] John M. Dolan,et al. Lane-change social behavior generator for autonomous driving car by non-parametric regression in Reproducing Kernel Hilbert Space , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Peter Vrancx,et al. Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets , 2017, AAAI.

[23] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[25] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.