暂无分享,去创建一个
Yuandong Tian | Jason D. Lee | Simon S. Du | Yulai Zhao | S. Du | J. Lee | Yuandong Tian | Yulai Zhao
[1] Stephen D. Patek,et al. Stochastic and shortest path games: theory and algorithms , 1997 .
[2] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.
[3] Noah Golowich,et al. Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.
[4] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[5] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[6] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.
[7] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[8] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[9] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[10] Yuxin Chen,et al. Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization , 2020, Oper. Res..
[11] Aranyak Mehta,et al. Progress in approximate nash equilibria , 2007, EC '07.
[12] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[13] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[14] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.
[17] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[18] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[19] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[20] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.
[21] Tamer Basar,et al. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.
[22] Peter Corcoran,et al. Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning , 2017, ArXiv.
[23] Lillian J. Ratliff,et al. Global Convergence of Policy Gradient for Sequential Zero-Sum Linear Quadratic Dynamic Games , 2019, ArXiv.
[24] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[25] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[26] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[27] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[28] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.
[29] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[30] Arnoud Pastink,et al. On the communication complexity of approximate Nash equilibria , 2012, Games Econ. Behav..
[31] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[32] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[33] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[34] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.
[35] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[36] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[37] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[38] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[39] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[40] Yuandong Tian,et al. ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero , 2019, ICML.
[41] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[42] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.
[43] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[44] Oskari Tammelin,et al. Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[47] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[48] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[49] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.
[50] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.
[51] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[52] Zhuoran Yang,et al. Provable Q-Iteration with L infinity Guarantees and Function Approximation , 2019 .
[53] Mika Göös,et al. Near-Optimal Communication Lower Bounds for Approximate Nash Equilibria , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).
[54] Noah A. Smith,et al. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.
[55] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[56] Paul G. Spirakis,et al. Computing Approximate Nash Equilibria in Polymatrix Games , 2015, Algorithmica.
[57] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.