Balancing Safety and Exploitability in Opponent Modeling

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent's strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents' preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent's preferences, leading to a higher rate of successful returns.

[1]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2]  H. Simon,et al.  Bounded Rationality and Organizational Learning , 1991 .

[3]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[4]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[5]  Peter McCracken,et al.  Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.

[6]  Shaul Markovitch,et al.  Learning and Exploiting Relative Weaknesses of Opponent Agents , 2005, Autonomous Agents and Multi-Agent Systems.

[7]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[8]  Yoav Shoham,et al.  A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.

[9]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[10]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[11]  Naftali Tishby,et al.  PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[12]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, IROS.

[13]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.