A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games
暂无分享,去创建一个
[1] R. Srikant,et al. On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation , 2023, AISTATS.
[2] Sarnaduti Brahma,et al. Convergence Rates of Asynchronous Policy Iteration for Zero-Sum Markov Games under Stochastic and Optimistic Settings , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).
[3] R. Srikant,et al. Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).
[4] D. Schuurmans,et al. Making Linear MDPs Practical via Contrastive Representation Learning , 2022, ICML.
[5] Asuman Ozdaglar,et al. Independent Learning in Stochastic Games , 2021, ArXiv.
[6] Wen Sun,et al. Representation Learning for Online and Offline RL in Low-rank MDPs , 2021, ICLR.
[7] D. Bertsekas. Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control , 2021, ArXiv.
[8] Tiancheng Yu,et al. The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces , 2021, ICML.
[9] Jason D. Lee,et al. Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games , 2021, AISTATS.
[10] Noah Golowich,et al. Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.
[11] Yaodong Yang,et al. An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective , 2020, ArXiv.
[12] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[13] Lin F. Yang,et al. Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity , 2020, NeurIPS.
[14] Yang Yang,et al. Multi-robot path planning based on a deep reinforcement learning DQN algorithm , 2020, CAAI Trans. Intell. Technol..
[15] S. Kakade,et al. FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.
[16] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[17] Zhuoran Yang,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT.
[18] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[19] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[20] A. Wierman,et al. Scalable Reinforcement Learning for Multiagent Networked Systems , 2019, Oper. Res..
[21] T. Başar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[22] Tamer Basar,et al. Non-Cooperative Inverse Reinforcement Learning , 2019, NeurIPS.
[23] M. Ghavamzadeh,et al. Multi-step Greedy Reinforcement Learning Algorithms , 2019, ICML.
[24] Lin F. Yang,et al. Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity , 2019, AISTATS.
[25] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[26] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[27] D. Shah,et al. Non-Asymptotic Analysis of Monte Carlo Tree Search , 2019, Proceedings of the ACM on Measurement and Analysis of Computing Systems.
[28] Shie Mannor,et al. How to Combine Tree-Search Methods in Reinforcement Learning , 2018, AAAI.
[29] Shie Mannor,et al. Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning , 2018, NIPS 2018.
[30] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.
[31] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[32] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[33] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[34] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[35] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[36] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[38] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[39] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[40] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[41] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[42] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[43] Ariel Rubinstein,et al. Experience from a Course in Game Theory: Pre- and Post-class Problem Sets as a Didactic Device , 1999 .
[44] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[45] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[46] J. Filar,et al. ON THE COMPUTATION OF EQUILIBRIA IN DISCOUNTED STOCHASTIC DYNAMIC GAMES , 1986 .
[47] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[48] J. Wal. Discounted Markov games: Generalized policy iteration method , 1978 .
[49] M. Pollatschek,et al. Algorithms for Stochastic Games with Geometrical Interpretation , 1969 .
[50] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[51] R. Srikant,et al. The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation , 2021, ArXiv.
[52] Handbook of Reinforcement Learning and Control , 2021, Studies in Systems, Decision and Control.
[53] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[54] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[55] Stephen D. Patek,et al. Stochastic and shortest path games: theory and algorithms , 1997 .
[56] J. Filar,et al. On the Algorithm of Pollatschek and Avi-ltzhak , 1991 .
[57] Anne Condon,et al. On Algorithms for Simple Stochastic Games , 1990, Advances In Computational Complexity Theory.
[58] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .