Learning to Play against Any Mixture of Opponents

Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a complicated cyber-security game. We find that Q-Mixing is able to successfully transfer knowledge across any mixture of opponents. We next consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent classifier---trained in parallel to Q-learning, using the same data---and use the classifier results to refine the mixing of Q-values. We find that Q-Mixing augmented with the opponent classifier function performs comparably, and with lower variance, than training directly against a mixed-strategy opponent.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Cynthia A. Phillips,et al.  A graph-based system for network-vulnerability analysis , 1998, NSPW '98.

[5]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[6]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[7]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[8]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[9]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[10]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[11]  Tuomas Sandholm,et al.  Learning Near-Pareto-Optimal Conventions in Polynomial Time , 2003, NIPS.

[12]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[13]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[16]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[17]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[18]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[19]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[20]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Michael H. Bowling,et al.  Online implicit agent modelling , 2013, AAMAS.

[22]  Shimon Whiteson,et al.  Learning potential functions and their representations for multi-task reinforcement learning , 2013, Autonomous Agents and Multi-Agent Systems.

[23]  Peter Stone,et al.  Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[24]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[25]  Branislav Bosanský,et al.  Optimal Network Security Hardening Using Attack Graph Games , 2015, IJCAI.

[26]  Eric Eaton,et al.  Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning , 2015, IJCAI.

[27]  Demosthenis Teneketzis,et al.  Optimal Defense Policies for Partially Observable Spreading Processes on Bayesian Attack Graphs , 2015, MTD@CCS.

[28]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[29]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30]  Bikramjit Banerjee,et al.  Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[31]  Branislav Bosanský,et al.  Case Studies of Network Defense with Attack Graph Games , 2016, IEEE Intelligent Systems.

[32]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[33]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[34]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[35]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[36]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[37]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[38]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[39]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[40]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[41]  Michael P. Wellman,et al.  Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis , 2017, MTD@CCS.

[42]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[43]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[44]  Yan Zheng,et al.  A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents , 2018, NeurIPS.

[45]  Yee Whye Teh,et al.  Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.

[46]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[47]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[48]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[49]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[50]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[51]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[52]  Surya Ganguli,et al.  An analytic theory of generalization dynamics and transfer learning in deep linear networks , 2018, ICLR.

[53]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[54]  Michael P. Wellman,et al.  Iterated Deep Reinforcement Learning in Games: History-Aware Training for Improved Stability , 2019, EC.

[55]  Yan Zheng,et al.  Towards Efficient Detection and Optimal Response against Sophisticated Opponents , 2018, IJCAI.

[56]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[57]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[58]  Michael P. Wellman,et al.  Iterative Empirical Game Solving via Single Policy Best Response , 2021, ICLR.