论文信息 - TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations - 字舞流文

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations

Deep reinforcement learning (DRL) has achieved super-human performance on complex video games (e.g., StarCraft II and Dota II). However, current DRL systems still suffer from challenges of multi-agent coordination, sparse rewards, stochastic environments, etc. In seeking to address these challenges, we employ a football video game, e.g., Google Research Football (GRF), as our testbed and develop an end-to-end learning-based AI system (denoted as TiKick23) to complete this challenging task. In this work, we first generated a large replay dataset from the self-playing of single-agent experts, which are obtained from league training. We then developed a distributed learning system and new offline algorithms to learn a powerful multi-agent AI from the fixed single-agent dataset. To the best of our knowledge, Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios. Extensive experiments further show that our pre-trained model can accelerate the training process of the modern multi-agent algorithm and our method achieves state-of-theart performances on various academic scenarios. ∗Equal contribution Codes can be found at https://github.com/TARTRL/TiKick. Videos available at https://sites.google.com/view/tikick. Preprint. Under review. ar X iv :2 11 0. 04 50 7v 2 [ cs .A I] 1 2 O ct 2 02 1

Shiyu Huang | Ting Chen | Deheng Ye | Wenze Chen | Longfei Zhang | Ziyang Li | Fengming Zhu | Jun Zhu | Shiyu Huang | Deheng Ye | Wenze Chen | Tingling Chen | Jun Zhu | Fengming Zhu | Ziyang Li | Longfei Zhang

[1] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[2] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[3] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[4] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.

[5] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .

[6] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[7] Zhen Xu,et al. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning , 2021, NeurIPS.

[8] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[9] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[10] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[11] Liang Wang,et al. Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[12] Kurt Keutzer,et al. BeBold: Exploration Beyond the Boundary of Explored Regions , 2020, ArXiv.

[13] Qianchuan Zhao,et al. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning , 2021, NeurIPS.

[14] Xia Hu,et al. Simplifying Deep Reinforcement Learning via Self-Supervision , 2021, ArXiv.

[15] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[16] Frans A. Oliehoek,et al. A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[17] Yuval Tassa,et al. From Motor Control to Team Play in Simulated Humanoid Football , 2021, Sci. Robotics.

[18] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[19] Yujing Hu,et al. Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games , 2021, ArXiv.

[20] Hiroaki Kitano,et al. RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[21] Jiashi Feng,et al. Policy Optimization with Demonstrations , 2018, ICML.

[22] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[24] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[25] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[26] Sergio Gomez Colmenarejo,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[29] Max Jaderberg,et al. Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.

[30] Olivier Bachem,et al. Google Research Football: A Novel Reinforcement Learning Environment , 2020, AAAI.

[31] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[32] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[33] Marlos C. Machado,et al. On Bonus Based Exploration Methods In The Arcade Learning Environment , 2020, ICLR.

[34] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[35] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[36] Yu Wang,et al. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games , 2021, NeurIPS.

[37] Shimon Whiteson,et al. MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[38] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.

[39] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[40] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[42] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[43] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.