Bounding Regret in Empirical Games

Empirical game-theoretic analysis refers to a set of models and techniques for solving large-scale games. However, there is a lack of a quantitative guarantee about the quality of output approximate Nash equilibria (NE). A natural quantitative guarantee for such an approximate NE is the regret in the game (i.e. the best deviation gain). We formulate this deviation gain computation as a multi-armed bandit problem, with a new optimization goal unlike those studied in prior work. We propose an efficient algorithm Super-Arm UCB (SAUCB) for the problem and a number of variants. We present sample complexity results as well as extensive experiments that show the better performance of SAUCB compared to several baselines.

[1]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[2]  Peter L. Bartlett,et al.  Improved Learning Complexity in Combinatorial Pure Exploration Bandits , 2016, AISTATS.

[3]  Wei Chen,et al.  Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version) , 2018, IJCAI.

[4]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[5]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[6]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[7]  Varun Grover,et al.  Active Learning in Multi-armed Bandits , 2008, ALT.

[8]  Michael P. Wellman,et al.  Searching for approximate equilibria in empirical games , 2008, AAMAS.

[9]  Michael P. Wellman,et al.  Empirical Game-Theoretic Analysis for Moving Target Defense , 2015, MTD@CCS.

[10]  Michael P. Wellman,et al.  A Cloaking Mechanism to Mitigate Market Manipulation , 2018, IJCAI.

[11]  Ruosong Wang,et al.  Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration , 2017, COLT.

[12]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[13]  Michael P. Wellman,et al.  Strategy exploration in empirical games , 2010, AAMAS.

[14]  Stefano Ermon,et al.  Adaptive Concentration Inequalities for Sequential Decision Problems , 2016, NIPS.

[15]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[16]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[17]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[18]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[19]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[20]  Jun Zhu,et al.  Identify the Nash Equilibrium in Static Games with Random Payoffs , 2017, ICML.

[21]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[22]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[23]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[24]  Varun Grover,et al.  Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..

[25]  Alessandro Lazaric,et al.  Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits , 2011, ALT.

[26]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[27]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[28]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[29]  Michael P. Wellman,et al.  Strategic Agent-Based Modeling of Financial Markets , 2017, RSF.

[30]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[31]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).