Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies

A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.

[1]  Etienne Perot,et al.  End-to-End Deep Reinforcement Learning for Lane Keeping Assist , 2016, ArXiv.

[2]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[5]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[6]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[8]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[11]  Luc Van Gool,et al.  End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[12]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[16]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[17]  Hayoung Kim,et al.  Deep Q Learning Based High Level Driving Policy Determination , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.