Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium
暂无分享,去创建一个
[1] Zhuoran Yang,et al. Towards General Function Approximation in Zero-Sum Markov Games , 2021, ICLR.
[2] Tiancheng Yu,et al. The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces , 2021, ICML.
[3] Chi Jin,et al. V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL , 2021, ArXiv.
[4] Zhuoran Yang,et al. On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game , 2021, ICML.
[5] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.
[6] Chi Jin,et al. Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms , 2021, NeurIPS.
[7] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[8] Lin F. Yang,et al. Minimax Sample Complexity for Turn-based Stochastic Game , 2020, UAI.
[9] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[10] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[11] Quanquan Gu,et al. Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation , 2021, ArXiv.
[12] Michael I. Jordan,et al. On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces , 2021 .
[13] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[14] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[15] Lin F. Yang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[16] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[17] Zhuoran Yang,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT.
[18] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[19] Quanquan Gu,et al. Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.
[20] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.
[21] Lin F. Yang,et al. Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity , 2019, AISTATS.
[22] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[23] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[24] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[25] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[26] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[27] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[28] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[29] Yin Tat Lee,et al. Solving linear programs in the current matrix multiplication time , 2018, STOC.
[30] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[31] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[32] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[33] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[34] Olivier Pietquin,et al. Learning Nash Equilibrium for General-Sum Markov Games from Batch Data , 2016, AISTATS.
[35] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[36] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[37] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[38] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[39] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[40] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[41] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[42] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[43] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[44] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, STOC '84.
[45] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.