Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.

[1]  Walter A. Kosters,et al.  Aspects of the Cooperative Card Game Hanabi , 2016, BNCAI.

[2]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[3]  Julian Togelius,et al.  Diverse Agents for Ad-Hoc Cooperation in Hanabi , 2019, 2019 IEEE Conference on Games (CoG).

[4]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[5]  Richard Bartle,et al.  The 2018 Hanabi competition , 2019, 2019 IEEE Conference on Games (CoG).

[6]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[7]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[8]  James Goodman,et al.  Re-determinizing Information Set Monte Carlo Tree Search in Hanabi , 2019, ArXiv.

[9]  Jakob N. Foerster,et al.  Improving Policies via Search in Cooperative Partially Observable Games , 2019, AAAI.

[10]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[11]  Chris Martens,et al.  An intentional AI for hanabi , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[12]  Ross A. Knepper,et al.  Implicit Communication of Actionable Information in Human-AI teams , 2019, CHI.

[13]  C. Cox,et al.  How to Make the Perfect Fireworks Display: Two Strategies for Hanabi , 2015 .

[14]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[15]  Hengyuan Hu,et al.  Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning , 2020, ICLR.

[16]  Adam Lerer,et al.  "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[17]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[18]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[19]  Simon M. Lucas,et al.  Evaluating and modelling Hanabi-playing agents , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[20]  A. Shamsai,et al.  Multi-objective Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[21]  Julian Togelius,et al.  Evolving Agents for the Hanabi 2018 CIG Competition , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[22]  Hirotaka Osawa,et al.  Solving Hanabi: Estimating Hands by Opponent's Actions in Cooperative Game with Incomplete Information , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[23]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[24]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[25]  Bruno Bouzy,et al.  Playing Hanabi Near-Optimally , 2017, ACG.

[26]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[27]  Siobhan Chapman Logic and Conversation , 2005 .