Continuous Deep Maximum Entropy Inverse Reinforcement Learning using online POMDP

A vehicle navigating in an urban environment must obey traffic rules by properly setting its speed, such as staying below the road speed limit and avoiding collision with other vehicles. This is presumably the scenario that autonomous vehicles will face: they will share the traffic roads with other vehicles (autonomous or not), cooperatively interacting with them. In other words, autonomous vehicles should not only follow traffic rules, but should also behave in such a way that resembles other vehicles behavior. However, manually specification of such behavior is a time-consuming and error-prone task, since driving in urban roads is a complex task, which involves many factors. This paper presents a multitask decision making framework that learns an expert driver's behavior driving in an urban scenario containing traffic lights and other vehicles. For this purpose, Inverse Reinforcement Learning (IRL) is used to learn a reward function that explains the expert driver's behavior. Most IRL approaches require solving a Markov Decision Process (MDP) in each iteration of the algorithm to compute the optimal policy given the current rewards. Nevertheless, the computational cost of solving an MDP is high when considering large state spaces. To overcome this issue, the optimal policy is estimated by sampling trajectories in regions of the space with higher rewards. To do so, the problem is modeled as a continuous Partially Observed Markov Decision Process (POMDP), in which the intentions of other vehicles are only partially observed. An online solver is employed in order to sample trajectories given the current rewards. The efficiency of the proposed framework is demonstrated through simulations, showing that the controlled vehicle is be able to mimic an expert driver's behavior.

[1]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[2]  Marcelo H. Ang,et al.  Situation-aware decision making for autonomous driving on urban road using online POMDP , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[3]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[4]  Christoph Stiller,et al.  The Role of Machine Vision for Intelligent Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[5]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[6]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[7]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[8]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  Weilong Song,et al.  Intention-Aware Autonomous Driving Decision-Making in an Uncontrolled Intersection , 2016 .

[11]  Rüdiger Dillmann,et al.  Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[12]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[13]  Hanna Kurniawati,et al.  An Online POMDP Solver for Uncertainty Planning in Dynamic Environment , 2013, ISRR.

[14]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[15]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16]  Rüdiger Dillmann,et al.  Probabilistic MDP-behavior planning for cars , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[17]  Christoph Stiller,et al.  Automated Driving in Uncertain Environments: Planning With Interaction and Uncertain Maneuver Prediction , 2018, IEEE Transactions on Intelligent Vehicles.

[18]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[19]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[20]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[21]  Prashant Doshi,et al.  A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..