Behavioral Evaluation of Hanabi Rainbow DQN Agents and Rule-Based Agents

Hanabi is a multiplayer cooperative card game, where only your partners know your cards. All players succeed or fail together. This makes the game an excellent testbed for studying collaboration. Recently, it has been shown that deep neural networks can be trained through self-play to play the game very well. However, such agents generally do not play well with others. In this paper, we investigate the consequences of training Rainbow DQN agents with human-inspired rule-based agents. We analyze with which agents Rainbow agents learn to play well, and how well playing skill transfers to agents they were not trained with. We also analyze patterns of communication between agents to elucidate how collaboration happens. A key finding is that while most agents only learn to play well with partners seen during training, one particular agent leads the Rainbow algorithm towards a much more general policy. The metrics and hypotheses advanced in this paper can be used for further study of collaborative agents.

[1]  Walter A. Kosters,et al.  Aspects of the Cooperative Card Game Hanabi , 2016, BNCAI.

[2]  Julian Togelius,et al.  Diverse Agents for Ad-Hoc Cooperation in Hanabi , 2019, 2019 IEEE Conference on Games (CoG).

[3]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[4]  C. Cox,et al.  How to Make the Perfect Fireworks Display: Two Strategies for Hanabi , 2015 .

[5]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[6]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[7]  Simon M. Lucas,et al.  Evaluating and modelling Hanabi-playing agents , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[8]  Hirotaka Osawa,et al.  Solving Hanabi: Estimating Hands by Opponent's Actions in Cooperative Game with Incomplete Information , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[9]  Bruno Bouzy,et al.  Playing Hanabi Near-Optimally , 2017, ACG.

[10]  Jakob N. Foerster,et al.  Improving Policies via Search in Cooperative Partially Observable Games , 2019, AAAI.

[11]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[12]  Julian Togelius,et al.  Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi , 2020, ArXiv.

[13]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[14]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[15]  Adam Lerer,et al.  "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[16]  J. Togelius,et al.  Evaluating RL Agents in Hanabi with Unseen Partners , 2020 .

[17]  Hengyuan Hu,et al.  Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning , 2020, ICLR.