暂无分享,去创建一个
[1] Martin J. Wainwright,et al. Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning , 2021, ArXiv.
[2] Alexandre Proutiere,et al. Navigating to the Best Policy in Markov Decision Processes , 2021, NeurIPS.
[3] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[4] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[5] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[6] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[7] Max Simchowitz,et al. Task-Optimal Exploration in Linear Dynamical Systems , 2021, ICML.
[8] Xiangyang Ji,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2021, COLT.
[9] E. Kaufmann,et al. Planning in Markov Decision Processes with Gap-Dependent Sample Complexity , 2020, NeurIPS.
[10] Anders Jonsson,et al. Fast active learning for pure exploration in reinforcement learning , 2020, ICML.
[11] Tengyu Ma,et al. Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap , 2021, COLT.
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[14] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[15] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[16] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[17] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[18] Ruosong Wang,et al. Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning? , 2020, ArXiv.
[19] Alexandre Proutière,et al. Exploration in Structured Reinforcement Learning , 2018, NeurIPS.
[20] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[21] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[22] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[23] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[24] Alexandre Proutiere,et al. Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity , 2020, ArXiv.
[25] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.
[26] Xiangyang Ji,et al. Nearly Minimax Optimal Reward-free Reinforcement Learning , 2020, ArXiv.
[27] Martin J. Wainwright,et al. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..
[28] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[29] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[30] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[31] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[32] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[33] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[34] Mykel J. Kochenderfer,et al. Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model , 2019, NeurIPS.
[35] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.