Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces
暂无分享,去创建一个
[1] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[2] Amy Greenwald,et al. Empirical Mechanism Design: Designing Mechanisms from Data , 2019, UAI.
[3] Tao Qin,et al. Competitive Bridge Bidding with Deep Neural Networks , 2019, AAMAS.
[4] David S. Leslie,et al. Bandit learning in concave $N$-person games , 2018, 1810.01925.
[5] Michael P. Wellman,et al. Strategic analysis with simulation-based games , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).
[6] H. Stackelberg,et al. Marktform und Gleichgewicht , 1935 .
[7] Marcello Restelli,et al. Equilibrium approximation in simulation-based extensive-form games , 2011, AAMAS.
[8] J. Zico Kolter,et al. What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.
[9] J. Zico Kolter,et al. Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games , 2019, AAAI.
[10] M. Sion. On general minimax theorems , 1958 .
[11] Eli Upfal,et al. Learning Simulation-Based Games from Data , 2019, AAMAS.
[12] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.
[13] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[14] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[15] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[16] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[17] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.
[18] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[19] Nicola Gatti,et al. Truthful learning mechanisms for multi-slot sponsored search auctions with externalities , 2012, Artif. Intell..
[20] Michal Valko,et al. Multiagent Evaluation under Incomplete Information , 2019, NeurIPS.
[21] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[22] Michael P. Wellman,et al. A Regression Approach for Modeling Games With Many Symmetric Players , 2018, AAAI.
[23] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[24] Samuel Sokota,et al. Learning Deviation Payoffs in Simulation-Based Games , 2019, AAAI.
[25] Milind Tambe,et al. Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization , 2018, AAAI.
[26] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[27] Michael P. Wellman,et al. Learning payoff functions in infinite games , 2005, Machine Learning.
[28] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[29] Joel Z. Leibo,et al. A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.
[30] Milind Tambe,et al. Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .
[31] Michael P. Wellman,et al. Probably Almost Stable Strategy Profiles in Simulation-Based Games , 2019 .
[32] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[33] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[34] Marcello Restelli,et al. Regret Minimization Algorithms for the Followers Behaviour Identification in Leadership Games , 2017, UAI.