Uncertainty-Aware Model-Based Reinforcement Learning with Application to Autonomous Driving

To further improve the learning efficiency and performance of the reinforcement learning (RL), in this paper we propose a novel uncertainty-aware model-based RL (UA-MBRL) framework, and then implement and validate it in autonomous driving under various task scenarios. First, an action-conditioned ensemble model with the ability of uncertainty assessment is established as the virtual environment model. Then, a novel uncertainty-aware model-based RL framework is developed based on the adaptive truncation approach, providing virtual interactions between the agent and environment model, and improving RL’s training efficiency and performance. The developed algorithms are then implemented in end-to-end autonomous vehicle control tasks, validated and compared with state-of-the-art methods under various driving scenarios. The validation results suggest that the proposed UA-MBRL method surpasses the existing model-based and model-free RL approaches, in terms of learning efficiency and achieved performance. The results also demonstrate the good ability of the proposed method with respect to the adaptiveness and robustness, under various autonomous driving scenarios.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Jingda Wu,et al.  Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving , 2021, ArXiv.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[7]  Zhu Han,et al.  A Deep Reinforcement Learning Network for Traffic Light Cycle Control , 2018, IEEE Transactions on Vehicular Technology.

[8]  Qi Sun,et al.  Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization , 2020, IEEE Transactions on Vehicular Technology.

[9]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[10]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[14]  Sergey Levine,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[15]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[16]  Michael I. Jordan,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[17]  Honglak Lee,et al.  Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning , 2020, NeurIPS.

[18]  Nicolas Le Roux,et al.  Understanding the impact of entropy on policy optimization , 2018, ICML.

[19]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[20]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[21]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[22]  Francesco Borrelli,et al.  A Learning-Based Framework for Velocity Control in Autonomous Driving , 2016, IEEE Transactions on Automation Science and Engineering.

[23]  Gabriel Kalweit,et al.  Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.

[24]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[25]  Jingda Wu,et al.  Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding , 2020, IEEE Sensors Journal.

[26]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[27]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[28]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[29]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[30]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[31]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..