论文信息 - Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot - 字舞流文

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent’s behavior constitutes (part of) another agent’s environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.

Joel Z. Leibo | Peter Sunehag | Thore Graepel | Igor Mordatch | Edgar A. Duéñez-Guzmán | Raphael Koster | Charlie Beattie | John P. Agapiou | Alexander Vezhnevets | Jayd Matyas | J. Agapiou | A. Vezhnevets | Charlie Beattie | T. Graepel | Igor Mordatch | Peter Sunehag | R. Koster | Jayd Matyas | P. Sunehag

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[3] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[4] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[5] Anca D. Dragan,et al. On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[6] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[7] Julian Togelius,et al. Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[8] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[9] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .

[10] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[11] Angeliki Lazaridou,et al. Emergent Multi-Agent Communication in the Deep Learning Era , 2020, ArXiv.

[12] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[13] Qiang Fu,et al. Towards Playing Full MOBA Games with Deep Reinforcement Learning , 2020, NeurIPS.

[14] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[15] Joelle Pineau,et al. On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[16] Jeff Clune,et al. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence , 2019, ArXiv.

[17] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[18] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[19] Laura Schulz,et al. The logic of universalization guides moral judgment , 2020, Proceedings of the National Academy of Sciences.

[20] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[21] J. Henrich. The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous , 2021 .

[22] Alexander Peysakhovich,et al. Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[23] Joel Z. Leibo,et al. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.

[24] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[26] Jakob N. Foerster,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[27] W. Hamilton,et al. The evolution of cooperation. , 1984, Science.

[28] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.

[29] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[30] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[31] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[32] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33] Shimon Whiteson,et al. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[34] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[35] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[36] John C. Harsanyi,et al. Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[37] Joel Z. Leibo,et al. OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning , 2020, ICML.

[38] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[39] Benja Fallenstein,et al. Aligning Superintelligence with Human Interests: A Technical Research Agenda , 2015 .

[40] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[41] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[42] Lianlong Wu,et al. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence , 2019, AAAI.

[43] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[44] Joshua B. Tenenbaum,et al. Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning , 2020, AAMAS.

[45] J. Harsanyi. Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[46] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[47] E. Ostrom,et al. Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[48] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[49] Julian Togelius,et al. Deep Reinforcement Learning for General Video Game AI , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[50] Rui Wang,et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[51] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[52] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[53] Joel Z. Leibo,et al. Open Problems in Cooperative AI , 2020, ArXiv.

[54] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[55] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[56] Andrew K. Lampinen,et al. Automated curriculum generation through setter-solver interactions , 2020, ICLR.

[57] Joelle Pineau,et al. Natural Environment Benchmarks for Reinforcement Learning , 2018, ArXiv.

[58] Joel Z. Leibo,et al. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences , 2020, ArXiv.

[59] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[60] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[61] Sam Devlin,et al. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition , 2019, ArXiv.

[62] R. Axelrod. On Six Advances in Cooperation Theory , 2000 .

[63] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[64] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[65] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[66] A. Tate. A measure of intelligence , 2012 .

[67] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[68] Stuart J. Russell,et al. Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[69] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[70] Joel Z. Leibo,et al. Generalization of Reinforcement Learners with Working and Episodic Memory , 2019, NeurIPS.

[71] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[72] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[73] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[74] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[75] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[76] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[77] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[78] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[79] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[80] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[81] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.