论文信息 - Bounding Regret in Empirical Games - 字舞流文

Bounding Regret in Empirical Games

Empirical game-theoretic analysis refers to a set of models and techniques for solving large-scale games. However, there is a lack of a quantitative guarantee about the quality of output approximate Nash equilibria (NE). A natural quantitative guarantee for such an approximate NE is the regret in the game (i.e. the best deviation gain). We formulate this deviation gain computation as a multi-armed bandit problem, with a new optimization goal unlike those studied in prior work. We propose an efficient algorithm Super-Arm UCB (SAUCB) for the problem and a number of variants. We present sample complexity results as well as extensive experiments that show the better performance of SAUCB compared to several baselines.

Long Tran-Thanh | Arunesh Sinha | Steven Jecmen | Zun Li | Long Tran-Thanh | Arunesh Sinha | Steven Jecmen | Zun Li

[1] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[2] Peter L. Bartlett,et al. Improved Learning Complexity in Combinatorial Pure Exploration Bandits , 2016, AISTATS.

[3] Wei Chen,et al. Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version) , 2018, IJCAI.

[4] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[5] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[6] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[7] Varun Grover,et al. Active Learning in Multi-armed Bandits , 2008, ALT.

[8] Michael P. Wellman,et al. Searching for approximate equilibria in empirical games , 2008, AAMAS.

[9] Michael P. Wellman,et al. Empirical Game-Theoretic Analysis for Moving Target Defense , 2015, MTD@CCS.

[10] Michael P. Wellman,et al. A Cloaking Mechanism to Mitigate Market Manipulation , 2018, IJCAI.

[11] Ruosong Wang,et al. Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration , 2017, COLT.

[12] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[13] Michael P. Wellman,et al. Strategy exploration in empirical games , 2010, AAMAS.

[14] Stefano Ermon,et al. Adaptive Concentration Inequalities for Sequential Decision Problems , 2016, NIPS.

[15] Stefano Ermon,et al. Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[16] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[17] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[18] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[19] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[20] Jun Zhu,et al. Identify the Nash Equilibrium in Static Games with Random Payoffs , 2017, ICML.

[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[22] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[23] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[24] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..

[25] Alessandro Lazaric,et al. Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits , 2011, ALT.

[26] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.

[27] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.

[28] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[29] Michael P. Wellman,et al. Strategic Agent-Based Modeling of Financial Markets , 2017, RSF.

[30] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[31] Robert D. Nowak,et al. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).