暂无分享,去创建一个
Xiaoyu Chen | Liwei Wang | Yuanhao Wang | Kefan Dong | Liwei Wang | Yuanhao Wang | Kefan Dong | Xiaoyu Chen
[1] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[2] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[3] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[5] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[6] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[7] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[8] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[9] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[10] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[11] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[12] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[13] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[14] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[17] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.