Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games
暂无分享,去创建一个
[1] Jason D. Lee,et al. Can We Find Nash Equilibria at a Linear Rate in Markov Games? , 2023, ICLR.
[2] Weiqiang Zheng,et al. Doubly Optimal No-Regret Learning in Monotone Games , 2023, ICML.
[3] Jianghai Hu,et al. Zeroth-Order Learning in Continuous Games via Residual Pseudogradient Estimates , 2023, 2301.02279.
[4] T. Zhang,et al. A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games , 2022, ICML.
[5] S. Du,et al. Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games , 2022, ICLR.
[6] Cong Ma,et al. $O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games , 2022, ArXiv.
[7] Caiming Xiong,et al. Policy Optimization for Markov Games: Unified Framework and Faster Convergence , 2022, NeurIPS.
[8] Eduard A. Gorbunov,et al. Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities , 2022, NeurIPS.
[9] M. Kamgarpour,et al. On the Rate of Convergence of Payoff-based Algorithms to Nash Equilibrium in Strongly Monotone Games , 2022, ArXiv.
[10] Tianyi Lin,et al. Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback , 2021, 2112.02856.
[11] Chi Jin,et al. V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL , 2021, ArXiv.
[12] Ashutosh Nayyar,et al. Learning Zero-sum Stochastic Games with Posterior Sampling , 2021, ArXiv.
[13] Zhuoran Yang,et al. Towards General Function Approximation in Zero-Sum Markov Games , 2021, ICLR.
[14] Tiancheng Yu,et al. The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces , 2021, ICML.
[15] Tamer Basar,et al. Decentralized Q-Learning in Zero-sum Markov Games , 2021, NeurIPS.
[16] Yuejie Chi,et al. Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization , 2021, NeurIPS.
[17] Jason D. Lee,et al. Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games , 2021, AISTATS.
[18] Haipeng Luo,et al. Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games , 2021, COLT.
[19] Noah Golowich,et al. Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.
[20] Anant Sahai,et al. On the Impossibility of Convergence of Mixed Strategies with No Regret Learning , 2020, ArXiv.
[21] Noah Golowich,et al. Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.
[22] A. Ozdaglar,et al. Fictitious play in zero-sum stochastic games , 2020, SIAM J. Control. Optim..
[23] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[24] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[25] Haipeng Luo,et al. Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.
[26] Zhuoran Yang,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT.
[27] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[28] J. Malick,et al. On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.
[29] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[30] David S. Leslie,et al. Bandit learning in concave $N$-person games , 2018, 1810.01925.
[31] Georgios Piliouras,et al. Multiplicative Weights Update in Zero-Sum Games , 2018, EC.
[32] Tengyuan Liang,et al. Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.
[33] Chi-Jen Lu,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[34] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[35] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.
[36] Serdar Yüksel,et al. Decentralized Q-Learning for Stochastic Teams and Games , 2015, IEEE Transactions on Automatic Control.
[37] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[38] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[39] Constantinos Daskalakis,et al. Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.
[40] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[41] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[42] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[43] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[44] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[45] P. Tseng. On linear convergence of iterative methods for the variational inequality problem , 1995 .
[46] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[47] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[48] J. Wal. Discounted Markov games: Generalized policy iteration method , 1978 .
[49] M. Pollatschek,et al. Algorithms for Stochastic Games with Geometrical Interpretation , 1969 .
[50] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[51] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[52] Shaocong Ma,et al. Sample Efficient Stochastic Policy Extragradient Algorithm for Zero-Sum Markov Game , 2022, ICLR.
[53] Yang Cai,et al. Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games , 2022, NeurIPS.
[54] V. Cevher,et al. A Natural Actor-Critic Framework for Zero-Sum Markov Games , 2022, ICML.
[55] P. Mertikopoulos,et al. On the Rate of Convergence of Regularized Learning in Games: From Bandits and Uncertainty to Optimism and Beyond , 2021, NeurIPS.
[56] Quanquan Gu,et al. Almost Optimal Algorithms for Two-player Zero-Sum Markov Games with Linear Function Approximation , 2021 .
[57] Alon Gonen. Understanding Machine Learning From Theory to Algorithms 1st Edition Shwartz Solutions Manual , 2015 .
[58] J. Filar,et al. On the Algorithm of Pollatschek and Avi-ltzhak , 1991 .
[59] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .