Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
暂无分享,去创建一个
Yuantao Gu | Yuejie Chi | Gen Li | Yuxin Chen | Yuting Wei
[1] Yuxin Chen,et al. Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning , 2021, ICML.
[2] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[3] Adam Wierman,et al. Distributed Reinforcement Learning in Multi-Agent Networked Systems , 2020, ArXiv.
[4] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[5] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[6] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[7] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[8] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[9] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[10] Thinh T. Doan,et al. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[13] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[14] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[15] H. Kappen,et al. Reinforcement Learning with a Near Optimal Rate of Convergence , 2011 .
[16] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[17] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[18] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.
[19] Devavrat Shah,et al. Q-learning with Nearest Neighbors , 2018, NeurIPS.
[20] R. Srikant,et al. Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..
[21] Siva Theja Maguluri,et al. Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.
[22] Abhijit Gosavi,et al. Boundedness of iterates in Q-Learning , 2006, Syst. Control. Lett..
[23] C. Watkins. Learning from delayed rewards , 1989 .
[24] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[25] Adithya M. Devraj,et al. Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning , 2020, ArXiv.
[26] Quanquan Gu,et al. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation , 2020, ICML.
[27] Lam M. Nguyen,et al. Convergence Rates of Accelerated Markov Gradient Descent with Applications in Reinforcement Learning , 2020, 2002.02873.
[28] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[29] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[30] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.
[31] Yingbin Liang,et al. Reanalysis of Variance Reduced Temporal Difference Learning , 2020, ICLR.
[32] Martin J. Wainwright,et al. On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration , 2020, COLT.
[33] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[34] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[35] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[36] Niao He,et al. Provably-Efficient Double Q-Learning , 2020, ArXiv.
[37] Martin J. Wainwright,et al. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..
[38] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.
[39] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[40] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[41] D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.
[42] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[43] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[44] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[45] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[46] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[47] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[48] Donghwan Lee,et al. Target-Based Temporal Difference Learning , 2019, ICML.
[49] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[50] Yingbin Liang,et al. Finite-Time Analysis for Double Q-learning , 2020, NeurIPS.
[51] Thinh T. Doan,et al. Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .
[52] Bowen Weng,et al. Momentum Q-learning with Finite-Sample Convergence Guarantee , 2020, ArXiv.
[53] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[54] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[55] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[56] Wotao Yin,et al. Markov chain block coordinate descent , 2018, Computational Optimization and Applications.
[57] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[58] Hoi-To Wai,et al. Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.
[59] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[60] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[61] Changxiao Cai,et al. Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021 .
[62] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[63] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[64] Yingbin Liang,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[65] Ruosong Wang,et al. Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity , 2020, ArXiv.
[66] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[67] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.