Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward

Discovering hazardous scenarios is crucial in testing and further improving driving policies. However, conducting efficient driving policy testing faces two key challenges. On the one hand, the probability of naturally encountering hazardous scenarios is low when testing a well-trained autonomous driving strategy. Thus, discovering these scenarios by purely real-world road testing is extremely costly. On the other hand, a proper determination of accident responsibility is necessary for this task. Collecting scenarios with wrong-attributed responsibilities will lead to an overly conservative autonomous driving strategy. To be more specific, we aim to discover hazardous scenarios that are autonomous-vehicle responsible (AV-responsible), i.e., the vulnerabilities of the under-test driving policy. To this end, this work proposes a Safety Test framework by finding Av-Responsible Scenarios (STARS) based on multiagent reinforcement learning. STARS guides other traffic participants to produce Av-Responsible Scenarios and make the under-test driving policy misbehave via introducing Hazard Arbitration Reward (HAR). HAR enables our framework to discover diverse, complex, and AV-responsible hazardous scenarios. Experimental results against four different driving policies in three environments demonstrate that STARS can effectively discover AV-responsible hazardous scenarios. These scenarios indeed correspond to the vulnerabilities of the undertest driving policies, thus are meaningful for their further improvements.

[1]  Ruigang Yang,et al.  The ApolloScape Open Dataset for Autonomous Driving and Its Application , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  L. Ghaoui,et al.  Robust markov decision processes with uncertain transition matrices , 2004 .

[4]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[5]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[6]  Dong Chen,et al.  SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving , 2020, ArXiv.

[7]  Jonathan P. How,et al.  Certified Adversarial Robustness for Deep Reinforcement Learning , 2019, CoRL.

[8]  Shaobing Xu,et al.  Driving-Policy Adaptive Safeguard for Autonomous Vehicles Using Reinforcement Learning , 2020, ArXiv.

[9]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[10]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11]  Sergey Levine,et al.  Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[12]  Richard Bowden,et al.  Training Adversarial Agents to Exploit Weaknesses in Deep Control Policies , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Arslan Munir,et al.  Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles , 2018, IEEE Intelligent Transportation Systems Magazine.

[14]  Amnon Shashua,et al.  On a Formal Model of Safe and Scalable Self-driving Cars , 2017, ArXiv.

[15]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[16]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[17]  Akifumi Wachi,et al.  Failure-Scenario Maker for Rule-Based Agent using Multi-agent Adversarial Reinforcement Learning and its Application to Autonomous Driving , 2019, IJCAI.

[18]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[19]  Nidhi Kalra,et al.  Driving to Safety , 2016 .

[20]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Henry X. Liu,et al.  Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment , 2021, Nature Communications.

[22]  Yu Wang,et al.  The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games , 2021, NeurIPS.

[23]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.