Cooperative Open-ended Learning Framework for Zero-shot Coordination

Zero-shot coordination in cooperative artificial intelligence (AI) remains a significant challenge, which means effectively coordinating with a wide range of unseen partners. Previous algorithms have attempted to address this challenge by optimizing fixed objectives within a population to improve strategy or behaviour diversity. However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy. We further specify the framework and propose a practical algorithm that leverages knowledge from game theory and graph theory. Furthermore, an analysis of the learning process of the algorithm shows that it can efficiently overcome cooperative incompatibility. The experimental results in the Overcooked game environment demonstrate that our method outperforms current state-of-the-art methods when coordinating with different-level partners. Our demo is available at https://sites.google.com/view/cole-2023.

[1]  Chao Qian,et al.  Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution , 2022, ArXiv.

[2]  P. Baldi,et al.  Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games , 2022, ArXiv.

[3]  Asier Mujika,et al.  Open-Ended Reinforcement Learning with Neural Reward Functions , 2022, NeurIPS.

[4]  Tim Rocktaschel,et al.  Generalization in Cooperative Multi-Agent Systems , 2022, ArXiv.

[5]  Yi Wu,et al.  Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination , 2021, AAAI.

[6]  Richard Everett,et al.  Collaborating with Humans without Human Data , 2021, NeurIPS.

[7]  Yujing Hu,et al.  Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games , 2021, ArXiv.

[8]  Matthew E. Taylor,et al.  Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems , 2021, AAMAS.

[9]  Sam Devlin,et al.  Evaluating the Robustness of Collaborative Agents , 2021, AAMAS.

[10]  Poramate Manoonpong,et al.  Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning , 2020, ICONIP.

[11]  Roy Fox,et al.  Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games , 2020, NeurIPS.

[12]  J. Togelius,et al.  Generating and Adapting to Diverse Ad Hoc Partners in Hanabi , 2020, IEEE Transactions on Games.

[13]  Jakob N. Foerster,et al.  "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[14]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[15]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[16]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[17]  Alexander Peysakhovich,et al.  Learning Social Conventions in Markov Games , 2018, ArXiv.

[18]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[19]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[20]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Jürgen Schmidhuber,et al.  Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[23]  Michael Wooldridge,et al.  Computational Aspects of Cooperative Game Theory , 2011, KES-AMSTA.

[24]  Daniel Gómez,et al.  Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[25]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[26]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[29]  B. Peleg,et al.  Introduction to the Theory of Cooperative Games , 1983 .

[30]  L. Shapley Cores of convex games , 1971 .

[31]  Hengyuan Hu,et al.  Trajectory Diversity for Zero-Shot Coordination , 2021, AAMAS.

[32]  Max Jaderberg,et al.  Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.

[33]  Yaodong Yang,et al.  Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games , 2021, NeurIPS.

[34]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[35]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[36]  L. Freeman Centrality in social networks conceptual clarification , 1978 .