Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of its opponents. We introduce a modeling technique that adaptively balances safety and exploitability. The opponent's strategy is modeled with a set of possible strategies that contains the actual one with high probability. The algorithm is safe as the expected payoff is above the minimax payoff with high probability, and can exploit the opponent's preferences when sufficient observations are obtained. We apply the algorithm to a robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent's preferences, leading to a higher rate of successful returns.
[1]
Shaul Markovitch,et al.
Learning and Exploiting Relative Weaknesses of Opponent Agents
,
2005,
Autonomous Agents and Multi-Agent Systems.
[2]
Jan Peters,et al.
A biomimetic approach to robot table tennis
,
2010,
2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[3]
Michael H. Bowling,et al.
Computing Robust Counter-Strategies
,
2007,
NIPS.
[4]
Jan Peters,et al.
A biomimetic approach to robot table tennis
,
2010,
IROS.
[5]
Eric Saund,et al.
Capturing The Information Conveyed By Opponents' Betting Behavior in Poker
,
2006,
2006 IEEE Symposium on Computational Intelligence and Games.
[6]
Yiannis Demiris,et al.
Predicting the Movements of Robot Teams Using Generative Models
,
2008,
DARS.