Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent
暂无分享,去创建一个
Karl Tuyls | Marc Lanctot | Dustin Morrill | Julien Pérolat | Jean-Baptiste Lespiau | Finbarr Timbers | Edward Lockhart | Marc Lanctot | K. Tuyls | J. Pérolat | Edward Lockhart | J. Lespiau | Dustin Morrill | Finbarr Timbers
[1] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[2] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[3] F. Clarke. Generalized gradients and applications , 1975 .
[4] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[5] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[9] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.
[10] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[11] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[12] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.
[13] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[14] Kevin Waugh,et al. Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.
[15] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..
[16] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.
[17] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[18] M. Panella. Associate Editor of the Journal of Computer and System Sciences , 2014 .
[19] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[20] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.
[21] Ian D. Watson,et al. Computer poker: A review , 2011, Artif. Intell..
[22] A. Hanks. Canada , 2002 .
[23] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[24] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[26] Michael H. Bowling,et al. Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.
[27] Takuya Kon-no,et al. Transactions of the American Mathematical Society , 2003 .
[28] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[29] Javier Peña,et al. A GRADIENT-BASED APPROACH FOR COMPUTING NASH EQUILIBRIA OF LARGE SEQUENTIAL GAMES , 2007 .
[30] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[31] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[32] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.
[33] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.
[34] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .
[35] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[36] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[37] P. K. Gupta,et al. Linear programming and theory of games , 1979 .
[38] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[39] Tuomas Sandholm,et al. Deep Counterfactual Regret Minimization , 2018, ICML.