Decision and control are two of the core functionalities of high-level automated vehicles. Current mainstream methods, such as functionality decomposition or end-to-end reinforcement learning (RL), either suffer high time complexity or poor interpretability and limited safety performance in realworld complex autonomous driving tasks. In this paper, we present an interpretable and efficient decision and control framework for automated vehicles, which decomposes the driving task into multi-path planning and optimal tracking that are structured hierarchically. First, the multi-path planning is to generate several paths only considering static constraints. Then, the optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, in theory, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately and choose the one with the best tracking performance to follow. More importantly, we propose a model-based reinforcement learning (RL) algorithm, which is served as an approximate constrained OCP solver, to unload the heavy computation by the paradigm of offline training and online application. Specifically, the OCPs for all paths are considered together to construct a multi-task RL problem and then solved offline by our algorithm into value and policy networks, for realtime online path selecting and tracking respectively. We verify our framework in both simulation and the real world. Results show that our method has better online computing efficiency and driving performance including traffic efficiency and safety compared with baseline methods. In addition, it yields great interpretability and adaptability among different driving tasks. The real road test also suggests that it is applicable in complicated traffic scenarios without even tuning.
[1]
Yun-Pang Flötteröd,et al.
Microscopic Traffic Simulation using SUMO
,
2018,
2018 21st International Conference on Intelligent Transportation Systems (ITSC).
[2]
Jingliang Duan,et al.
Mixed Policy Gradient
,
2021,
ArXiv.
[3]
Wenjun Wang,et al.
Interactive Trajectory Prediction of Surrounding Road Users for Autonomous Driving Using Structural-LSTM Network
,
2020,
IEEE Transactions on Intelligent Transportation Systems.
[4]
Moritz Diehl,et al.
CasADi: a software framework for nonlinear optimization and optimal control
,
2018,
Mathematical Programming Computation.
[5]
Sepp Hochreiter,et al.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
,
2015,
ICLR.
[6]
Sifa Zheng,et al.
Numerically Stable Dynamic Bicycle Model for Discrete-time Control
,
2020,
2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops).
[7]
Sebastian Thrun,et al.
Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments
,
2010,
Int. J. Robotics Res..
[8]
Qi Sun,et al.
Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization
,
2020,
IEEE Transactions on Vehicular Technology.
[9]
Jimmy Ba,et al.
Adam: A Method for Stochastic Optimization
,
2014,
ICLR.
[10]
Sebastian Thrun,et al.
Junior: The Stanford entry in the Urban Challenge
,
2008,
J. Field Robotics.
[11]
Bo Cheng,et al.
Estimation of driving style in naturalistic highway traffic using maneuver transition probabilities
,
2017
.