Identify the Nash Equilibrium in Static Games with Random Payoffs

We study the problem on how to learn the pure Nash Equilibrium of a two-player zero-sum static game with random payoffs under unknown distributions via efficient payoff queries. We introduce a multi-armed bandit model to this problem due to its ability to find the best arm efficiently among random arms and propose two algorithms for this problem—LUCB-G based on the confidence bounds and a racing algorithm based on successive action elimination. We provide an analysis on the sample complexity lower bound when the Nash Equilibrium exists.

[1]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[2]  Rajarshi Das,et al.  Choosing Samples to Compute Heuristic-Strategy Nash Equilibrium , 2003, AMEC.

[3]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[4]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[5]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Michael P. Wellman,et al.  Learning payoff functions in infinite games , 2005, Machine Learning.

[8]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[9]  Noam Nisan,et al.  The Query Complexity of Correlated Equilibria , 2013, Games Econ. Behav..

[10]  Yakov Babichenko,et al.  Query complexity of approximate nash equilibria , 2013, STOC.

[11]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[12]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[13]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[14]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[15]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[16]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[17]  Wouter M. Koolen,et al.  Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.

[18]  Paul W. Goldberg,et al.  Bounds for the Query Complexity of Approximate Equilibria , 2016, ACM Trans. Economics and Comput..

[19]  Paul W. Goldberg,et al.  Learning equilibria of games via payoff queries , 2013, EC '13.

[20]  Xiaotie Deng,et al.  Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[21]  Michael P. Wellman,et al.  Searching for approximate equilibria in empirical games , 2008, AAMAS.

[22]  Paul W. Goldberg,et al.  Query Complexity of Approximate Equilibria in Anonymous Games , 2015, WINE.

[23]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[24]  John Fearnley,et al.  Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries , 2013, ACM Trans. Economics and Comput..