Towards a Fast Detection of Opponents in Repeated Stochastic Games

Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresponding algorithm that combines the two established frameworks Pepper and Bayesian policy reuse. For each interaction, the algorithm faces a repeated stochastic game with an unknown (small) number of repetitions against a random opponent from a population, without observing the opponent’s identity. Our algorithm is composed of two main steps: first it draws inspiration from multiagent algorithms to obtain acting policies in stochastic games, and second it computes a belief over the possible opponents that is updated as the interaction occurs. This allows the agent to quickly select the appropriate policy against the opponent. Our results show fast detection of the opponent from its behavior, obtaining higher average rewards than the state-of-the-art baseline Pepper in repeated stochastic games.

[1]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[2]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[3]  Peter Stone,et al.  Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[4]  Pablo Hernandez-Leal,et al.  A framework for learning and planning against switching strategies in repeated games , 2014, Connect. Sci..

[5]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[7]  Jacob W. Crandall,et al.  Robust Learning for Repeated Stochastic Games via Meta-Gaming , 2014, IJCAI.

[8]  Peter Stone,et al.  Multiagent learning in the presence of memory-bounded agents , 2013, Autonomous Agents and Multi-Agent Systems.

[9]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[12]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[13]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[14]  Maria L. Gini,et al.  Fast adaptive learning in repeated stochastic games by game abstraction , 2014, AAMAS.

[15]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[16]  Jacob W. Crandall,et al.  Towards Minimizing Disappointment in Repeated Games , 2014, J. Artif. Intell. Res..

[17]  Matthew E. Taylor,et al.  Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach , 2016, AAAI Workshop: Multiagent Interaction without Prior Coordination.

[18]  R. Bellman A Markovian Decision Process , 1957 .

[19]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[20]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[21]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .

[22]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[23]  Jacob W. Crandall,et al.  Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games , 2012, AAMAS.

[24]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[25]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[26]  Jacob W. Crandall,et al.  Belief and Truth in Hypothesised Behaviours , 2015, Artif. Intell..

[27]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Benjamin Rosman,et al.  Bayesian policy reuse , 2015, Machine Learning.

[29]  Ioannis P. Vlahavas,et al.  Transfer Learning in Multi-Agent Reinforcement Learning Domains , 2011, EWRL.

[30]  Tracy Xiao Liu,et al.  Behavioral spillovers and cognitive load in multiple games: An experimental study , 2012, Games Econ. Behav..

[31]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[32]  Yusen Zhan,et al.  Efficiently detecting switches against non-stationary opponents , 2017, Autonomous Agents and Multi-Agent Systems.

[33]  Pablo Hernandez-Leal,et al.  Learning against sequential opponents in repeated stochastic games , 2017 .