暂无分享,去创建一个
Michal Valko | Tadashi Kozuno | Pierre M'enard | R'emi Munos | R. Munos | Michal Valko | Pierre M'enard | Tadashi Kozuno
[1] Tuomas Sandholm,et al. Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games , 2021, AAAI.
[2] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[3] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[4] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[5] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[6] Tuomas Sandholm,et al. Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games , 2021, AAAI.
[7] Tuomas Sandholm,et al. Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent , 2020, AAAI.
[8] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[9] Tuomas Sandholm,et al. Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games , 2020, ArXiv.
[10] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[11] Jun Zhu,et al. Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information , 2020, ICLR.
[12] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[13] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .
[14] Tuomas Sandholm,et al. Solving Large Sequential Games with the Excessive Gap Technique , 2018, NeurIPS.
[15] Daniel Hennes,et al. Fast computation of Nash Equilibria in Imperfect Information Games , 2020, ICML.
[16] Oskari Tammelin,et al. Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.
[17] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[18] Marc Lanctot,et al. Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..
[19] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..
[20] Tuomas Sandholm,et al. Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions , 2019, NeurIPS.
[21] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.
[22] Javier Peña,et al. First-Order Algorithm with O(ln(1/e)) Convergence for e-Equilibrium in Two-Person Zero-Sum Games , 2008, AAAI.
[23] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[24] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[25] Tuomas Sandholm,et al. Stochastic Regret Minimization in Extensive-Form Games , 2020, ICML.
[26] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.
[27] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[28] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[29] Kevin Waugh,et al. Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.
[30] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.
[31] Geoffrey J. Gordon. No-regret Algorithms for Online Convex Programs , 2006, NIPS.
[32] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[33] Martin Schmid,et al. Revisiting CFR+ and Alternating Updates , 2018, J. Artif. Intell. Res..
[34] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[35] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[36] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..
[37] Kevin Waugh,et al. Faster algorithms for extensive-form game solving via improved smoothing functions , 2018, Mathematical Programming.
[38] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .