Probabilistic MDP-behavior planning for cars

This paper presents a method for high-level decision making in traffic environments. In contrast to the usual approach of modeling decision policies by hand, a Markov Decision Process (MDP) is employed to plan the optimal policy by assessing the outcomes of actions. Using probability theory, decisions are deduced automatically from the knowledge about how road users behave over time. This approach does neither depend on an explicit situation recognition nor is it limited to only a variety of situations or types of descriptions. Hence it is versatile and powerful. The contribution of this paper is a mathematical framework to derive abstract symbolic states from complex continuous temporal models encoded as Dynamic Bayesian Networks (DBN). For this purpose discrete MDP states are interpreted by random variables. To make computation feasible this space grows adaptively during planning and according to the problem to be solved.

[1]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[2]  R. Bellman A Markovian Decision Process , 1957 .

[3]  Alexei Makarenko,et al.  Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..

[4]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[5]  J. Schroder,et al.  Behavior Decision and Path Planning for Cognitive Vehicles using Behavior Networks , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Rüdiger Dillmann,et al.  A probabilistic model for estimating driver behaviors and vehicle trajectories in traffic environments , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[10]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[11]  Rüdiger Dillmann,et al.  Recursive importance sampling for efficient grid-based occupancy filtering in dynamic environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[13]  R. Dillmann,et al.  Design of the planner of team AnnieWAY’s autonomous vehicle used in the DARPA Urban Challenge 2007 , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[14]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[15]  Stuart J. Russell,et al.  The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.