Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
暂无分享,去创建一个
[1] Rahul Jain,et al. Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP , 2021, ICML.
[2] Yuanzhi Li,et al. Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning , 2021, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).
[3] Yuxin Chen,et al. Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning , 2021, NeurIPS.
[4] Haipeng Luo,et al. Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path , 2021, NeurIPS.
[5] Alessandro Lazaric,et al. Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret , 2021, NeurIPS.
[6] S. Du,et al. Nearly Horizon-Free Offline Reinforcement Learning , 2021, NeurIPS.
[7] Michal Valko,et al. UCB Momentum Q-learning: Correcting the bias without forgetting , 2021, ICML.
[8] Tengyu Ma,et al. Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap , 2021, COLT.
[9] S. Du,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, COLT.
[10] Lin F. Yang,et al. Q-learning with Logarithmic Regret , 2020, AISTATS.
[11] Xiangyang Ji,et al. Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP , 2021, ArXiv.
[12] S. Du,et al. Randomized Exploration is Near-Optimal for Tabular MDP , 2021, ArXiv.
[13] Xiangyang Ji,et al. Nearly Minimax Optimal Reward-free Reinforcement Learning , 2020, ArXiv.
[14] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[15] Krzysztof Choromanski,et al. On Optimism in Model-Based Reinforcement Learning , 2020, ArXiv.
[16] Lin F. Yang,et al. Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning? , 2020, ArXiv.
[17] Xiangyang Ji,et al. Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition , 2020, NeurIPS.
[18] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[19] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[20] Xiangyang Ji,et al. Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function , 2019, NeurIPS.
[21] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[22] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[23] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[24] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[25] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[26] Alessandro Lazaric,et al. Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes , 2018, NeurIPS.
[27] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[28] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[29] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[30] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[31] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[32] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[33] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[34] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[35] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[36] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[37] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[38] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[39] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[40] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[41] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[42] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[43] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[44] Shumeet Baluja,et al. Advances in Neural Information Processing , 1994 .
[45] D. Freedman. On Tail Probabilities for Martingales , 1975 .