暂无分享,去创建一个
[1] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[4] R. Durrett. Probability: Theory and Examples , 1993 .
[5] Y. Ollivier,et al. CURVATURE, CONCENTRATION AND ERROR ESTIMATES FOR MARKOV CHAIN MONTE CARLO , 2009, 0904.1312.
[6] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[7] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[8] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[9] H. Robbins. A Stochastic Approximation Method , 1951 .
[10] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[11] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[12] Yingbin Liang,et al. Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation , 2019, ArXiv.
[13] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[14] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[15] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[16] Kaiqing Zhang,et al. Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents. , 2018 .
[17] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[18] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[19] Benjamin Recht,et al. Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator , 2019, NeurIPS.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[22] Bin Hu,et al. Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory , 2019, NeurIPS.
[23] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[24] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[27] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[28] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[29] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[30] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Eric Moulines,et al. Non-asymptotic Analysis of Biased Stochastic Approximation Scheme , 2019, COLT.
[33] Ruggero Carli,et al. Lyapunov Theory for Discrete Time Systems , 2018, 1809.05289.
[34] Thinh T. Doan,et al. Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .
[35] Brian D. O. Anderson,et al. Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation , 2019, IEEE Transactions on Automatic Control.