论文信息 - Melting Pot 2.0 - 字舞流文

Melting Pot 2.0

Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by"solipsistic"approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a"substrate") with a reference set of co-players (a"background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.

Joel Z. Leibo | Edgar A. Duéñez-Guzmán | Michael Bradley Johanson | J. Agapiou | A. Vezhnevets | Igor Mordatch | D. Mobbs | Peter Sunehag | D. Strouse | R. Koster | Udari Madhushani | Julia Haas | Yiran Mao | R. Comanescu | Jayd Matyas | Kavya Kopparapu | Sukhdeep Singh

[1] Joel Z. Leibo,et al. Rethink reporting of evaluation results in AI , 2023, Science.

[2] Joel Z. Leibo,et al. Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning , 2022, ArXiv.

[3] M. Samwald,et al. Mapping global dynamics of benchmark creation and saturation in artificial intelligence , 2022, Nature Communications.

[4] Joel Z. Leibo,et al. Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents , 2022, Proceedings of the National Academy of Sciences.

[5] Po-Sen Huang,et al. Ethical and social risks of harm from Language Models , 2021, ArXiv.

[6] Joel Z. Leibo,et al. Statistical discrimination in learning agents , 2021, ArXiv.

[7] Richard Everett,et al. Collaborating with Humans without Human Data , 2021, NeurIPS.

[8] D. Acemoglu. Harms of AI , 2021, SSRN Electronic Journal.

[9] Joel Z. Leibo,et al. Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot , 2021, ICML.

[10] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[11] Joel Z. Leibo,et al. A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings , 2021, Collective Intelligence.

[12] Matteo Hessel,et al. Podracer architectures for scalable Reinforcement Learning , 2021, ArXiv.

[13] Joel Z. Leibo,et al. Modelling Cooperation in Network Games with Spatio-Temporal Complexity , 2021, AAMAS.

[14] Joel Z. Leibo,et al. Open Problems in Cooperative AI , 2020, ArXiv.

[15] Bo Liu,et al. Towards Playing Full MOBA Games with Deep Reinforcement Learning , 2020, NeurIPS.

[16] Joel Z. Leibo,et al. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences , 2020, ArXiv.

[17] Michael Muthukrishna,et al. The Origins and Psychology of Human Cooperation. , 2020, Annual review of psychology.

[18] Joel Z. Leibo,et al. OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning , 2020, ICML.

[19] Joshua B. Tenenbaum,et al. Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning , 2020, AAMAS.

[20] Joel Z. Leibo,et al. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.

[21] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[22] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[23] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[24] Anca D. Dragan,et al. On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[25] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[26] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[27] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[28] Kristina Lerman,et al. A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[29] Adolfo Martínez Usó,et al. Item response theory in AI: Analysing machine learning classifiers at the instance level , 2019, Artif. Intell..

[30] Michael P. Wellman,et al. Machine behaviour , 2019, Nature.

[31] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[32] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.

[33] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[34] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[35] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[36] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[37] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[38] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[39] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[40] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[41] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[42] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[43] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[44] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[45] José Hernández-Orallo,et al. Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement , 2017, Artificial Intelligence Review.

[46] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[47] Adam Lerer,et al. Prosocial learning agents solve generalized Stag Hunts better than selfish ones , 2017, AAMAS.

[48] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[49] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[50] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[51] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[52] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[53] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[54] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[55] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[56] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[57] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[58] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[59] Stuart J. Russell,et al. Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[60] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[61] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[62] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[63] A B Haidich,et al. Meta-analysis in medical research. , 2010, Hippokratia.

[64] E. Ostrom,et al. Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[65] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[66] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[67] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[68] W. Hamilton,et al. The evolution of cooperation. , 1984, Science.

[69] Howard Raiffa,et al. Games and Decisions: Introduction and Critical Survey. , 1958 .

[70] David Gale,et al. Review: R. Duncan Luce and Howard Raiffa, Games and decisions: Introduction and critical survey , 1958 .

[71] L. Tesfatsion. Agent-Based Computational Economics: Overview and Brief History 1 , 2021 .

[72] J. Hernández-Orallo,et al. Tracking the Impact and Evolution of AI: The AIcollaboratory , 2020 .

[73] Moritz Hardt,et al. A Meta-Analysis of Overfitting in Machine Learning , 2019, NeurIPS.

[74] D. M. V. Hesteren,et al. Evolutionary Game Theory , 2021, Encyclopedia of Evolutionary Psychological Science.

[75] Benja Fallenstein,et al. Aligning Superintelligence with Human Interests: A Technical Research Agenda , 2015 .

[76] E. Ostrom. Understanding Institutional Diversity , 2005 .

[77] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[78] David Hume. A Treatise of Human Nature: Being an Attempt to introduce the experimental Method of Reasoning into Moral Subjects , 1972 .