Opponent Modeling in Deep Reinforcement Learning

Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because strategies interact with each other and change. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent's action, we encode observation of the opponents into a deep Q-Network (DQN); however, we retain explicit modeling (if desired) using multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Jonathan Schaeffer,et al.  Opponent Modeling in Poker , 1998, AAAI/IAAI.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[7]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Pieter Spronck,et al.  Opponent Modeling in Real-Time Strategy Games , 2007, GAMEON.

[10]  Risto Miikkulainen,et al.  Evolving explicit opponent models in game playing , 2007, GECCO '07.

[11]  Mark Richards,et al.  Opponent Modeling in Scrabble , 2007, IJCAI.

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[14]  Ian D. Watson,et al.  On Combining Decisions from Multiple Expert Imitators for Performance , 2011, IJCAI.

[15]  Jordan L. Boyd-Graber,et al.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games , 2012, EMNLP.

[16]  Michael H. Bowling,et al.  Online implicit agent modelling , 2013, AAMAS.

[17]  Marc'Aurelio Ranzato,et al.  Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.

[18]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[21]  Sergey Levine,et al.  Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Shimon Whiteson,et al.  Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks , 2016, ArXiv.

[23]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.