Settling the Sample Complexity of Online Reinforcement Learning
暂无分享,去创建一个
[1] Yuxin Chen,et al. The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model , 2023, ArXiv.
[2] Wen Sun,et al. The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning , 2023, ArXiv.
[3] Gen Li,et al. Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time , 2023, ArXiv.
[4] Yuxin Chen,et al. Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning , 2023, ArXiv.
[5] Quanquan Gu,et al. Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency , 2023, COLT.
[6] S. Du,et al. Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments , 2023, ICML.
[7] Yuxin Chen,et al. Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model , 2022, NeurIPS.
[8] S. Du,et al. On Gap-dependent Bounds for Offline Reinforcement Learning , 2022, NeurIPS.
[9] Yuxin Chen,et al. Settling the Sample Complexity of Model-Based Offline Reinforcement Learning , 2022, ArXiv.
[10] S. Du,et al. Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies , 2022, COLT.
[11] Jianqing Fan,et al. The Efficacy of Pessimism in Asynchronous Q-Learning , 2022, IEEE Transactions on Information Theory.
[12] Yu-Xiang Wang,et al. Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism , 2022, ICLR.
[13] Yuxin Chen,et al. Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity , 2022, ICML.
[14] Kevin G. Jamieson,et al. First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach , 2021, ICML.
[15] Yuanzhi Li,et al. Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning , 2021, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).
[16] Yuxin Chen,et al. Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning , 2021, NeurIPS.
[17] Julian Zimmert,et al. Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning , 2021, NeurIPS.
[18] Alessandro Lazaric,et al. A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs , 2021, ArXiv.
[19] Haipeng Luo,et al. Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path , 2021, NeurIPS.
[20] Caiming Xiong,et al. Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning , 2021, NeurIPS.
[21] Alessandro Lazaric,et al. Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret , 2021, NeurIPS.
[22] S. Du,et al. Nearly Horizon-Free Offline Reinforcement Learning , 2021, NeurIPS.
[23] Stuart J. Russell,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[24] Michal Valko,et al. UCB Momentum Q-learning: Correcting the bias without forgetting , 2021, ICML.
[25] Simon S. Du,et al. Near-Optimal Randomized Exploration for Tabular Markov Decision Processes , 2021, NeurIPS.
[26] Ee,et al. Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021, Operations Research.
[27] Tengyu Ma,et al. Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap , 2021, COLT.
[28] Martin J. Wainwright,et al. Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning , 2021, IEEE Transactions on Information Theory.
[29] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[30] Lin F. Yang,et al. Minimax Sample Complexity for Turn-based Stochastic Game , 2020, UAI.
[31] Michal Valko,et al. Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited , 2020, ALT.
[32] S. Du,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, COLT.
[33] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[34] Lin F. Yang,et al. Q-learning with Logarithmic Regret , 2020, AISTATS.
[35] Haipeng Luo,et al. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs , 2020, NeurIPS.
[36] Yuxin Chen,et al. Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction , 2020, IEEE Transactions on Information Theory.
[37] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[38] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[39] Lin F. Yang,et al. Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning? , 2020, ArXiv.
[40] Xiangyang Ji,et al. Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition , 2020, NeurIPS.
[41] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[42] Siva Theja Maguluri,et al. Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes , 2020, NeurIPS.
[43] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.
[44] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[45] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[46] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[47] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[48] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.
[49] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[50] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[51] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[52] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[53] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[54] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[55] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[56] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[57] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[58] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[59] Yuanzhi Li,et al. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits , 2018, ICML.
[60] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[61] John Langford,et al. Open Problem: First-Order Regret Bounds for Contextual Bandits , 2017, COLT.
[62] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[63] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[64] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[65] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[66] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[67] R. Srikant,et al. Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..
[68] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[69] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[70] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[71] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[72] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[73] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[74] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[75] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[76] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[77] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[78] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[79] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[80] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[81] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[82] D. Bertsekas. Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .
[83] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[84] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .