Towards Tight Bounds on the Sample Complexity of Average-reward MDPs
暂无分享,去创建一个
[1] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[2] Alexander Gasnikov,et al. Parallel Stochastic Mirror Descent for MDPs , 2021 .
[3] Wotao Yin,et al. How Does an Approximate Model Help in Reinforcement Learning , 2019 .
[4] Richard Peng,et al. Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).
[5] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[6] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[7] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[8] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[9] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[10] Ronald Ortner,et al. Regret Bounds for Reinforcement Learning via Markov Chain Concentration , 2018, J. Artif. Intell. Res..
[11] Aaron Sidford,et al. Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.
[12] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[13] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[14] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[15] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[16] Kevin Tian,et al. Variance Reduction for Matrix Games , 2019, NeurIPS.
[17] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[18] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[19] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[20] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[23] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[24] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .