论文信息 - Identify the Nash Equilibrium in Static Games with Random Payoffs

Identify the Nash Equilibrium in Static Games with Random Payoffs

We study the problem on how to learn the pure Nash Equilibrium of a two-player zero-sum static game with random payoffs under unknown distributions via efficient payoff queries. We introduce a multi-armed bandit model to this problem due to its ability to find the best arm efficiently among random arms and propose two algorithms for this problem—LUCB-G based on the confidence bounds and a racing algorithm based on successive action elimination. We provide an analysis on the sample complexity lower bound when the Nash Equilibrium exists.

[1] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[2] Rajarshi Das,et al. Choosing Samples to Compute Heuristic-Strategy Nash Equilibrium , 2003, AMEC.

[3] Andrew W. Moore,et al. The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[4] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[5] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Michael P. Wellman,et al. Learning payoff functions in infinite games , 2005, Machine Learning.

[8] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[9] Noam Nisan,et al. The Query Complexity of Correlated Equilibria , 2013, Games Econ. Behav..

[10] Yakov Babichenko,et al. Query complexity of approximate nash equilibria , 2013, STOC.

[11] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[12] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[13] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[14] Robert D. Nowak,et al. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[15] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[16] Stefano Ermon,et al. Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[17] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.

[18] Paul W. Goldberg,et al. Bounds for the Query Complexity of Approximate Equilibria , 2016, ACM Trans. Economics and Comput..

[19] Paul W. Goldberg,et al. Learning equilibria of games via payoff queries , 2013, EC '13.

[20] Xiaotie Deng,et al. Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[21] Michael P. Wellman,et al. Searching for approximate equilibria in empirical games , 2008, AAMAS.

[22] Paul W. Goldberg,et al. Query Complexity of Approximate Equilibria in Anonymous Games , 2015, WINE.

[23] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[24] John Fearnley,et al. Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries , 2013, ACM Trans. Economics and Comput..