Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages

The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.

[1]  Philip Koopman,et al.  Challenges in Autonomous Vehicle Testing and Validation , 2016 .

[2]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[3]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[5]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[6]  Antonio Bicchi,et al.  On making robots understand safety: Embedding injury knowledge into control , 2012, Int. J. Robotics Res..

[7]  Markus Borg,et al.  Safely Entering the Deep: A Review of Verification and Validation for Machine Learning and a Challenge Elicitation in the Automotive Industry , 2018, Journal of Automotive Software Engineering.

[8]  Chiman Kwan,et al.  A neural network based approach to adaptive fault tolerant flight control , 2004, Proceedings of the 2004 IEEE International Symposium on Intelligent Control, 2004..

[9]  Marcello Restelli,et al.  Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving , 2020, Robotics Auton. Syst..

[10]  Hermann Winner,et al.  The New Role of Road Testing for the Safety Validation of Automated Vehicles , 2017 .

[11]  Kush R. Varshney,et al.  On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products , 2016, Big Data.

[12]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[13]  Richard Bowden,et al.  Safe Deep Neural Network-Driven Autonomous Vehicles Using Software Safety Cages , 2019, IDEAL.

[14]  Junjie Wang,et al.  Lane Change Decision-making through Deep Reinforcement Learning with Rule-based Constraints , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[15]  Richard Bowden,et al.  Training Adversarial Agents to Exploit Weaknesses in Deep Control Policies , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Charles Desjardins,et al.  Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach , 2011, IEEE Transactions on Intelligent Transportation Systems.

[17]  Jingda Wu,et al.  Battery-Involved Energy Management for Hybrid Electric Bus Based on Expert-Assistance Deep Deterministic Policy Gradient Algorithm , 2020, IEEE Transactions on Vehicular Technology.

[18]  Patrik Feth,et al.  Safety Engineering for Autonomous Vehicles , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W).

[19]  Lionel Lapierre,et al.  Enhancing fault tolerance of autonomous mobile robots , 2015, Robotics Auton. Syst..

[20]  R. Mazo On the theory of brownian motion , 1973 .

[21]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[22]  Simon Burton,et al.  Making the Case for Safety of Machine Learning in Highly Automated Driving , 2017, SAFECOMP Workshops.

[23]  Daniel Görges,et al.  Ecological Adaptive Cruise Control for Vehicles With Step-Gear Transmission Based on Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[24]  Luc Van Gool,et al.  End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[25]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[26]  Jingda Wu,et al.  Battery Thermal- and Health-Constrained Energy Management for Hybrid Electric Bus Based on Soft Actor-Critic DRL Algorithm , 2021, IEEE Transactions on Industrial Informatics.

[27]  Matthias Althoff,et al.  Using online verification to prevent autonomous vehicles from causing accidents , 2020, Nature Machine Intelligence.

[28]  Nidhi Kalra,et al.  Driving to Safety , 2016 .

[29]  Qichao Zhang,et al.  Model-Free Optimal Control Based Intelligent Cruise Control with Hardware-in-the-Loop Demonstration [Research Frontier] , 2017, IEEE Computational Intelligence Magazine.

[30]  Sören Hohmann,et al.  Actor-Critic Reinforcement Learning for Linear Longitudinal Output Control of a Road Vehicle , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[31]  Masayoshi Tomizuka,et al.  Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[34]  Richard Bowden,et al.  A Survey of Deep Learning Applications to Autonomous Vehicle Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[35]  Shuhei Yamashita,et al.  Introduction of ISO 26262 'Road vehicles-Functional safety' , 2012 .

[36]  Vladlen Koltun,et al.  On Offline Evaluation of Vision-based Driving Models , 2018, ECCV.

[37]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[39]  Mehrdad Dianati,et al.  Towards connected autonomous driving: review of use-cases , 2018, Vehicle System Dynamics.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Ruslan Salakhutdinov,et al.  Weakly-Supervised Reinforcement Learning for Controllable Behavior , 2020, NeurIPS.

[42]  Chung Choo Chung,et al.  Autonomous braking system via deep reinforcement learning , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[43]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[44]  Trevor Darrell,et al.  Deep Object-Centric Policies for Autonomous Driving , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[45]  P. Pandurang Nayak,et al.  Model Based Autonomy for Robust Mars Operations , 1998 .

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  Saber Fallah,et al.  End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with Temporal Context , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[48]  Jun Tan,et al.  Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[49]  Byron Boots,et al.  Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.

[50]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[51]  Richard Bowden,et al.  Deep Learning for Autonomous Vehicle Control: Algorithms, State-of-the-Art, and Future Prospects , 2019, Deep Learning for Autonomous Vehicle Control.

[52]  Sebastian Thrun,et al.  Toward robotic cars , 2010, CACM.

[53]  Ilya Kolmanovsky,et al.  Deep Reinforcement Learning with Enhanced Safety for Autonomous Highway Driving , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[54]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[55]  Azim Eskandarian,et al.  Handbook of Intelligent Vehicles , 2012 .

[57]  Francesco Flammini,et al.  Software Verification and Validation of Safe Autonomous Cars: A Systematic Literature Review , 2021, IEEE Access.

[58]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Frank Diermeyer,et al.  Survey on Scenario-Based Safety Assessment of Automated Vehicles , 2020, IEEE Access.

[60]  Chaoxian Wu,et al.  Research Advances and Challenges of Autonomous and Connected Ground Vehicles , 2019, IEEE Transactions on Intelligent Transportation Systems.

[61]  Martin Fränzle,et al.  A Fail-safe Architecture for Automated Driving , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[62]  Mykel J. Kochenderfer,et al.  A Survey of Algorithms for Black-Box Safety Validation , 2020, J. Artif. Intell. Res..