Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. However, the interaction with dynamic objects requires an extended planning horizon, which depends on sequential context modeling. In this work, we are concerned with the sequential reward prediction over an extended time horizon. We present a neural network architecture that uses a policy attention mechanism to generate a low-dimensional context vector by concentrating on trajectories with a human-like driving style. Apart from this, we propose a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards. We evaluate our results on complex simulated driving situations, including other moving vehicles. Our evaluation shows that our policy attention mechanism learns to focus on collision-free policies in the configuration space. Furthermore, the temporal attention mechanism learns persistent interaction with other vehicles over an extended planning horizon.

[1]  Matthew McNaughton,et al.  Parallel Algorithms for Real-time Motion Planning , 2011 .

[2]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[3]  Sanjay Krishnan,et al.  HIRL: Hierarchical Inverse Reinforcement Learning for Long-Horizon Tasks with Delayed Rewards , 2016, ArXiv.

[4]  Steffen Heinrich,et al.  Planning Universal On-Road Driving Strategies for Automated Vehicles , 2018 .

[5]  Hironobu Fujiyoshi,et al.  Attention Branch Network: Learning of Attention Mechanism for Visual Explanation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Stefan Roth,et al.  Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Stefan Roth,et al.  Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[11]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[12]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[13]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14]  Jin-Woo Lee,et al.  Automated tactical maneuver discovery, reasoning and trajectory planning for autonomous driving , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Heinz Koeppl,et al.  Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling , 2018, J. Mach. Learn. Res..

[16]  Timothy Bretl,et al.  Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Sanyuan Zhao,et al.  Learning Unsupervised Video Object Segmentation Through Visual Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[19]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[20]  Prashant Doshi,et al.  A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..

[21]  Lars C. Wolf,et al.  RoadGraph: High level sensor data fusion between objects and street network , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[22]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.