POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections

When applying autonomous driving technology to real-world scenarios, environmental uncertainties make the development of decision-making algorithms difficult. Modeling the problem as a Partially Observable Markov Decision Process (POMDP) [1] allows the algorithm to consider these uncertainties in the decision process, which makes it more robust to real sensor characteristics. However, solving the POMDP with reinforcement learning (RL) [2] often requires storing a large number of observations. Furthermore, for continuous action spaces, the system is computationally inefficient. This paper addresses these problems by proposing to model the problem as an MDP and learn a policy with RL using hierarchical options (HOMDP). The suggested algorithm can store the state-action pairs and only uses current observations to solve a POMDP problem. We compare the results of to the time-to-collision method [3] and the proposed POMDP-with-LSTM method. Our results show that the HOMDP approach is able to improve the performance of the agent for a four-way intersection task with two-way stop signs. The HOMDP method can generate both higher-level discrete options and lower-level continuous actions with only the observations of the current step.

[1]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[2]  Weilong Song,et al.  Intention-Aware Autonomous Driving Decision-Making in an Uncontrolled Intersection , 2016 .

[3]  Pascal Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[4]  Maxim Egorov,et al.  Deep Reinforcement Learning with POMDPs , 2015 .

[5]  David Isele,et al.  Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning , 2017 .

[6]  David Isele,et al.  Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[8]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[9]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[10]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[11]  Xiao Lin,et al.  Research on car-following model based on SUMO , 2014, The 7th IEEE/International Conference on Advanced Infocomm Technology.

[12]  David Isele,et al.  Transferring Autonomous Driving Knowledge on Simulated and Real Intersections , 2017, ArXiv.

[13]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[14]  John M. Dolan,et al.  Intention estimation for ramp merging control in autonomous driving , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[15]  David N. Lee,et al.  A Theory of Visual Control of Braking Based on Information about Time-to-Collision , 1976, Perception.

[16]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Peter Stone,et al.  Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[19]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[20]  Daniel Krajzewicz,et al.  Recent Development and Applications of SUMO - Simulation of Urban MObility , 2012 .

[21]  John M. Dolan,et al.  Lane-change social behavior generator for autonomous driving car by non-parametric regression in Reproducing Kernel Hilbert Space , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Peter Vrancx,et al.  Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets , 2017, AAAI.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.