Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination

Achieving coordination between humans and artificial intelligence in scenarios involving previously unencountered humans remains a substantial obstacle within Zero-Shot Human-AI Coordination, which aims to develop AI agents capable of efficiently working alongside previously unknown human teammates. Traditional algorithms have aimed to collaborate with humans by optimizing fixed objectives within a population, fostering diversity in strategies and behaviors. However, these techniques may lead to learning loss and an inability to cooperate with specific strategies within the population, a phenomenon named cooperative incompatibility. To mitigate this issue, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We put forth a practical algorithm incorporating insights from game theory and graph theory, e.g., Shapley Value and Centrality. We also show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis. Subsequently, we created an online Overcooked human-AI experiment platform, the COLE platform, which enables easy customization of questionnaires, model weights, and other aspects. Utilizing the COLE platform, we enlist 130 participants for human experiments. Our findings reveal a preference for our approach over state-of-the-art methods using a variety of subjective metrics. Moreover, objective experimental outcomes in the Overcooked game environment indicate that our method surpasses existing ones when coordinating with previously unencountered AI agents and the human proxy model. Our code and demo are publicly available at https://sites.google.com/view/cole-2023.

[1]  Lianyu Zheng,et al.  Proactive human-robot collaboration: Mutual-cognitive, predictable, and self-organising perspectives , 2023, Robotics Comput. Integr. Manuf..

[2]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Yali Du,et al.  Cooperative Open-ended Learning Framework for Zero-shot Coordination , 2023, ICML.

[4]  Chao Qian,et al.  Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution , 2022, ArXiv.

[5]  P. Baldi,et al.  Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games , 2022, ArXiv.

[6]  Asier Mujika,et al.  Open-Ended Reinforcement Learning with Neural Reward Functions , 2022, NeurIPS.

[7]  Tim Rocktaschel,et al.  Generalization in Cooperative Multi-Agent Systems , 2022, ArXiv.

[8]  Yi Wu,et al.  Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination , 2021, AAAI.

[9]  Dorsa Sadigh,et al.  PantheonRL: A MARL Library for Dynamic Training Interactions , 2021, AAAI Conference on Artificial Intelligence.

[10]  Richard Everett,et al.  Collaborating with Humans without Human Data , 2021, NeurIPS.

[11]  Ragunathan Rajkumar,et al.  Human-Robot Cooperation for Autonomous Vehicles and Human Drivers: Challenges and Solutions , 2021, IEEE Communications Magazine.

[12]  Matthew E. Taylor,et al.  Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems , 2021, AAMAS.

[13]  Sam Devlin,et al.  Evaluating the Robustness of Collaborative Agents , 2021, AAMAS.

[14]  Poramate Manoonpong,et al.  Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning , 2020, ICONIP.

[15]  Angelo Cangelosi,et al.  At Your Service: Coffee Beans Recommendation From a Robot Assistant , 2020, HAI.

[16]  Filippos Christianos,et al.  Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning , 2020, ICML.

[17]  Roy Fox,et al.  Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games , 2020, NeurIPS.

[18]  J. Togelius,et al.  Generating and Adapting to Diverse Ad Hoc Partners in Hanabi , 2020, IEEE Transactions on Games.

[19]  Jakob N. Foerster,et al.  "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[20]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[21]  S. Whiteson,et al.  Deep Coordination Graphs , 2019, ICML.

[22]  Guy Hoffman,et al.  Evaluating Fluency in Human–Robot Collaboration , 2019, IEEE Transactions on Human-Machine Systems.

[23]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[24]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[25]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[26]  Alexander Peysakhovich,et al.  Learning Social Conventions in Markov Games , 2018, ArXiv.

[27]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[28]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[29]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  Jürgen Schmidhuber,et al.  Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[32]  Michael Wooldridge,et al.  Computational Aspects of Cooperative Game Theory , 2011, KES-AMSTA.

[33]  Daniel Gómez,et al.  Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[34]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[35]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[36]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[37]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[38]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[39]  B. Peleg,et al.  Introduction to the Theory of Cooperative Games , 1983 .

[40]  L. Shapley Cores of convex games , 1971 .

[41]  Nat Dilokthanakul,et al.  Generating Diverse Cooperative Agents by Learning Incompatible Policies , 2023, ICLR.

[42]  Sebastiaan De Peuter,et al.  Zero-Shot Assistance in Novel Decision Problems , 2022, ArXiv.

[43]  Hengyuan Hu,et al.  Trajectory Diversity for Zero-Shot Coordination , 2021, AAMAS.

[44]  Max Jaderberg,et al.  Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.

[45]  Yaodong Yang,et al.  Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games , 2021, NeurIPS.

[46]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[47]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[48]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[49]  L. Freeman Centrality in social networks conceptual clarification , 1978 .