暂无分享,去创建一个
[1] E. Bolthausen. The Berry-Esseen theorem for functionals of discrete Markov chains , 1980 .
[2] D. Ruppert. A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure , 1985 .
[3] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[4] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[5] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[10] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[11] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[12] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .
[13] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[14] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[15] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[16] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[17] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[18] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[19] P. Glynn,et al. Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .
[20] S. Meyn,et al. Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.
[21] R. Schwabe,et al. A law of the iterated logarithm for stochastic approximation procedures in d-dimensional Euclidean space , 2003 .
[22] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[23] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[24] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[25] A. Mokkadem,et al. The Compact Law of the Iterated Logarithm for Multivariate Stochastic Approximation Algorithms , 2005 .
[26] Sean P. Meyn,et al. Relative entropy and exponential deviation bounds for general Markov chains , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..
[27] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[28] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[29] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[30] H. Robbins. A Stochastic Approximation Method , 1951 .
[31] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[32] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[33] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[34] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[35] Sean P. Meyn,et al. TD-learning with exploration , 2011, IEEE Conference on Decision and Control and European Control Conference.
[36] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[37] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[38] Tor Lattimore,et al. The Sample-Complexity of General Reinforcement Learning , 2013, ICML.
[39] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[40] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[41] Sean P. Meyn,et al. Fastest Convergence for Q-learning , 2017, ArXiv.
[42] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[43] Sean P. Meyn,et al. Zap Q-Learning - A User's Guide , 2019, 2019 Fifth Indian Control Conference (ICC).
[44] Chong Li,et al. Model-Free Reinforcement Learning , 2019, Reinforcement Learning for Cyber-Physical Systems.
[45] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[46] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[47] Benoît R. Kloeckner. Effective Berry–Esseen and concentration bounds for Markov chains with a spectral gap , 2019, The Annals of Applied Probability.
[48] John E. R. Staddon,et al. The dynamics of behavior: Review of Sutton and Barto: Reinforcement Learning : An Introduction (2 nd ed.) , 2020 .
[49] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.
[50] Ana Busic,et al. Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.
[51] Siva Theja Maguluri,et al. Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.
[52] Sean P. Meyn,et al. Fundamental Design Principles for Reinforcement Learning Algorithms , 2021, Handbook of Reinforcement Learning and Control.