Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning With Application to Autonomous Driving

In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning become increasingly powerful when solving the forward problem: given designed reward/cost functions, how we should optimize them and obtain driving policies that interact with the environment safely and efficiently. Such progress has raised another equally important question: what should we optimize? Instead of manually specifying the reward functions, it is desired that we can extract what human drivers try to optimize from real traffic data and assign that to autonomous vehicles to enable more naturalistic and transparent interaction between humans and intelligent agents. To address this issue, we present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this letter. Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers. We evaluate the proposed algorithm via real-world driving data, including both non-interactive and interactive scenarios. The experimental results show that the proposed algorithm achieves more accurate prediction performance with faster convergence speed and better generalization compared to other baseline IRL algorithms.

[1]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[2]  Wei Zhan,et al.  Constrained iterative LQR for on-road autonomous driving motion planning , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[3]  Masayoshi Tomizuka,et al.  Constructing a Highly Interactive Vehicle Motion Dataset , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[5]  Wei Zhan,et al.  A non-conservatively defensive strategy for urban autonomous driving , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[6]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[7]  Anca D. Dragan,et al.  Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state , 2018, Autonomous Robots.

[8]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[9]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[10]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Jin-Woo Lee,et al.  Tunable and stable real-time trajectory planning for urban autonomous driving , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Anca D. Dragan,et al.  Courteous Autonomous Cars , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[14]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[16]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[17]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[18]  Dimitar Filev,et al.  Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning , 2019, Robotics Auton. Syst..

[19]  Anca D. Dragan,et al.  LESS is More: Rethinking Probabilistic Models of Human Behavior , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[20]  Joel W. Burdick,et al.  Inverse Reinforcement Learning via Function Approximation for Clinical Motion Analysis , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jonathan P. How,et al.  Real-Time Motion Planning With Applications to Autonomous Urban Driving , 2009, IEEE Transactions on Control Systems Technology.

[22]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[23]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[24]  Wei Zhan,et al.  Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[25]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.