论文信息 - Approximate exploitability: Learning a best response in large games - 字舞流文

Approximate exploitability: Learning a best response in large games

A common metric in games of imperfect information is exploitability, i.e. the performance of a policy against the worst-case opponent. This metric has many nice properties, but is intractable to compute in large games as it requires a full search of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric to exploitability using an approximate best response. This method scales to large games with tractable belief spaces. We focus only on the two-player, zero-sum case. Additionally, we provide empirical results for a specific instance of the method, demonstrating that it can effectively exploit agents in large games. We demonstrate that our method converges to exploitability in the tabular setting and the function approximation setting for small games, and demonstrate that it can consistently find exploits for weak policies in large games, showing results on Chess, Go, Heads-up No Limit Texas Hold'em, and other games.

Michael Bowling | Marc Lanctot | Martin Schmid | Finbarr Timbers | Edward Lockhart

[1] Thore Graepel,et al. Re-evaluating evaluation , 2018, NeurIPS.

[2] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[3] Jonathan Schaeffer,et al. CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[4] Christos H. Papadimitriou,et al. α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[5] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[6] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[7] Michael Johanson,et al. Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[8] Tuomas Sandholm,et al. Depth-Limited Solving for Imperfect-Information Games , 2018, NeurIPS.

[9] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[10] Nolan Bard,et al. Online Agent Modelling in Human-Scale Problems , 2016 .

[11] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[12] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[13] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[14] Petr Baudis,et al. PACHI: State of the Art Open Source Go Program , 2011, ACG.

[15] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[16] Amy Greenwald,et al. Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods , 2017 .

[17] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[18] David A. Ferrucci,et al. Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[19] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[20] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[21] Manuela Veloso,et al. Multiagent learning in the presence of agents with limitations , 2003 .

[22] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[23] Michael H. Bowling,et al. Eqilibrium Approximation Quality of Current No-Limit Poker Bots , 2016, AAAI Workshops.

[24] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[25] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.