论文信息 - MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling

MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling

This paper presents a sample-based algorithm for the computation of restricted Nash strategies in complex extensive form games. Recent work indicates that regret-minimization algorithms using selective sampling, such as Monte-Carlo Counterfactual Regret Minimization (MCCFR), converge faster to Nash equilibrium (NE) strategies than their nonsampled counterparts which perform a full tree traversal. In this paper, we show that MCCFR is also able to establish NE strategies in the complex domain of Poker. Although such strategies are defensive (i.e. safe to play), they are oblivious to opponent mistakes. We can thus achieve better performance by using (an estimation of) opponent strategies. The Restricted Nash Response (RNR) algorithm was proposed to learn robust counter-strategies given such knowledge. It solves a modified game, wherein it is assumed that opponents play according to a fixed strategy with a certain probability, or to a regret-minimizing strategy otherwise. We improve the rate of convergence of the RNR algorithm using sampling. Our new algorithm, MCRNR, samples only relevant parts of the game tree. It is therefore able to converge faster to robust best-response strategies than RNR. We evaluate our algorithm on a variety of imperfect information games that are small enough to solve yet large enough to be strategically interesting, as well as a large game, Texas Hold'em Poker.

[1] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[2] Michael H. Bowling,et al. Data Biased Robust Counter Strategies , 2009, AISTATS.

[3] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[4] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[5] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[6] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[7] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[8] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[9] Duane Szafron,et al. Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS.

[10] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[11] Kevin Waugh,et al. A Practical Use of Imperfect Recall , 2009, SARA.

[12] S. Ross. GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[13] No-Regret Algorithms for Structured Prediction Problems , 2005 .

[14] Darse Billings. Algorithms and assessment in computer poker , 2006 .