Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios

Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities. There exists no centralized control structure and we do not allow communications. Therefore, it is unknown if other drivers are willing to cooperate, and if so to what extent. We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL). We propose Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum- entropy off-policy MARL algorithm allowing for centralized training with decentralized execution. We show that using DASAC we are able to successfully negotiate and traverse the scenario considered over 99% of the time. Our agents are robust to an unknown timing of opponent decisions, an unobservable degree of cooperativeness of the opposing vehicle, and previously unencountered policies. Furthermore, they learn to exhibit human-like behaviors such as defensive driving, anticipating solution options and interpreting the behavior of other agents.

[1]  Filippos Christianos,et al.  Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning , 2019, ArXiv.

[2]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[3]  James Webb Game Theory: Decisions, Interaction and Evolution , 2006 .

[4]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Matthias Althoff,et al.  CommonRoad: Composable benchmarks for motion planning on roads , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[7]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[8]  C.J. Tomlin,et al.  Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.

[9]  Masayoshi Tomizuka,et al.  Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[11]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[12]  Matthias Althoff,et al.  Online Verification of Automated Road Vehicles Using Reachability Analysis , 2014, IEEE Transactions on Robotics.

[13]  Daniele Molinari,et al.  Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning , 2019, AAMAS.

[14]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[15]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[16]  John M. Dolan,et al.  POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[17]  Petros Christodoulou,et al.  Soft Actor-Critic for Discrete Action Settings , 2019, ArXiv.

[18]  David Isele,et al.  CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning , 2018, ICLR.

[19]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[20]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[21]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[22]  Yichuan Charlie Tang,et al.  Towards Learning Multi-Agent Negotiations via Self-Play , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  John M. Dolan,et al.  Attention-based Hierarchical Deep Reinforcement Learning for Lane Change Behaviors in Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  John A. Michon,et al.  A critical view of driver behavior models: What do we know , 1985 .

[26]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[27]  Luis E. Ortiz,et al.  Longitudinal Position Control for Highway On-Ramp Merging: A Multi-Agent Approach to Automated Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[28]  Drew Wicke,et al.  Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.