论文信息 - Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.

Julian Togelius | Stefan Menzel | Rodrigo Canaan | Andy Nealen | Xianbo Gao

[1] Walter A. Kosters,et al. Aspects of the Cooperative Card Game Hanabi , 2016, BNCAI.

[2] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[3] Julian Togelius,et al. Diverse Agents for Ad-Hoc Cooperation in Hanabi , 2019, 2019 IEEE Conference on Games (CoG).

[4] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[5] Richard Bartle,et al. The 2018 Hanabi competition , 2019, 2019 IEEE Conference on Games (CoG).

[6] Antoine Cully,et al. Robots that can adapt like animals , 2014, Nature.

[7] Jean-Baptiste Mouret,et al. Illuminating search spaces by mapping elites , 2015, ArXiv.

[8] James Goodman,et al. Re-determinizing Information Set Monte Carlo Tree Search in Hanabi , 2019, ArXiv.

[9] Jakob N. Foerster,et al. Improving Policies via Search in Cooperative Partially Observable Games , 2019, AAAI.

[10] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[11] Chris Martens,et al. An intentional AI for hanabi , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[12] Ross A. Knepper,et al. Implicit Communication of Actionable Information in Human-AI teams , 2019, CHI.

[13] C. Cox,et al. How to Make the Perfect Fireworks Display: Two Strategies for Hanabi , 2015 .

[14] Kenneth O. Stanley,et al. Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[15] Hengyuan Hu,et al. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning , 2020, ICLR.

[16] Adam Lerer,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[17] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[18] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[19] Simon M. Lucas,et al. Evaluating and modelling Hanabi-playing agents , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[20] A. Shamsai,et al. Multi-objective Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[21] Julian Togelius,et al. Evolving Agents for the Hanabi 2018 CIG Competition , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[22] Hirotaka Osawa,et al. Solving Hanabi: Estimating Hands by Opponent's Actions in Cooperative Game with Incomplete Information , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[23] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[24] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[25] Bruno Bouzy,et al. Playing Hanabi Near-Optimally , 2017, ACG.

[26] D. Premack,et al. Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[27] Siobhan Chapman. Logic and Conversation , 2005 .