暂无分享,去创建一个
[1] Lin F. Yang,et al. Q-learning with Logarithmic Regret , 2020, AISTATS.
[2] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[3] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[4] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[5] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[6] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Sylvain Sorin,et al. Stochastic Games and Applications , 2003 .
[9] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[10] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[11] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[12] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[13] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[14] Hao Wu,et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.
[15] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.
[16] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[17] Sham M. Kakade,et al. Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity , 2020, NeurIPS.
[18] Lin F. Yang,et al. Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? , 2020, NeurIPS.
[19] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[21] Lin F. Yang,et al. Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity , 2019, AISTATS.
[22] Lin F. Yang,et al. On the Optimality of Sparse Model-Based Planning for Markov Decision Processes , 2019, ArXiv.
[23] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[24] Yinyu Ye,et al. Towards solving 2-TBSG efficiently , 2019, Optim. Methods Softw..
[25] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[26] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[27] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.