Is Q-learning Provably Efficient?
暂无分享,去创建一个
Michael I. Jordan | Sébastien Bubeck | Chi Jin | Zeyuan Allen-Zhu | Sébastien Bubeck | Chi Jin | Zeyuan Allen-Zhu
[1] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[2] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[3] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[4] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[5] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[6] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[7] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[8] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[9] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[10] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[11] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[12] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[17] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[18] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[19] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[20] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[21] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[22] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[23] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[24] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[25] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[26] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[27] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.