Online implicit agent modelling

The traditional view of agent modelling is to infer the explicit parameters of another agent's strategy (i.e., their probability of taking each action in each situation). Unfortunately, in complex domains with high dimensional strategy spaces, modelling every parameter often requires a prohibitive number of observations. Furthermore, given a model of such a strategy, computing a response strategy that is robust to modelling error may be impractical to compute online. Instead, we propose an implicit modelling framework where agents aim to estimate the utility of a fixed portfolio of pre-computed strategies. Using the domain of heads-up limit Texas hold'em poker, this work describes an end-to-end approach for building an implicit modelling agent. We compute robust response strategies, show how to select strategies for the portfolio, and apply existing variance reduction and online learning techniques to dynamically adapt the agent's strategy to its opponent. We validate the approach by showing that our implicit modelling agent would have won the heads-up limit opponent exploitation event in the 2011 Annual Computer Poker Competition.

[1]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[2]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[3]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[4]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[5]  Duane Szafron,et al.  Strategy evaluation in extensive games with importance sampling , 2008, ICML '08.

[6]  Ian D. Watson,et al.  On Combining Decisions from Multiple Expert Imitators for Performance , 2011, IJCAI.

[7]  Troels Bjerre Lund,et al.  Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[8]  Tuomas Sandholm,et al.  Safe opponent exploitation , 2012, EC '12.

[9]  Bret Hoehn,et al.  Effective short-term opponent exploitation in simplified poker , 2005, Machine Learning.

[10]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[11]  Kevin Waugh,et al.  Abstraction pathologies in extensive games , 2009, AAMAS.

[12]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[13]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Ian D. Watson,et al.  Opponent Type Adaptation for Case-Based Strategies in Adversarial Games , 2012, ICCBR.

[15]  Peter McCracken,et al.  Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.