暂无分享,去创建一个
[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[2] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[3] Vivek S. Borkar,et al. Empirical Q-Value Iteration , 2014, Stochastic Systems.
[4] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[5] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[8] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[9] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[10] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[11] Alexander L. Strehl,et al. Probably Approximately Correct (PAC) Exploration in Reinforcement Learning , 2008, ISAIM.
[12] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[13] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.
[14] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[15] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[16] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[17] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[18] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[19] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[20] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[21] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.