Fundamental Design Principles for Reinforcement Learning Algorithms
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Amir Dembo,et al. Large Deviations Techniques and Applications , 1998 .
[3] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[4] H. Robbins. A Stochastic Approximation Method , 1951 .
[5] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[6] S. Meyn,et al. Computable exponential convergence rates for stochastically ordered Markov processes , 1996 .
[7] J. H. Venter. An extension of the Robbins-Monro procedure , 1967 .
[8] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[9] Sean P. Meyn,et al. Zap Q-Learning for Optimal Stopping , 2020, 2020 American Control Conference (ACC).
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] J. Blum. Multidimensional Stochastic Approximation Methods , 1954 .
[12] S. Meyn. Large deviation asymptotics and control variates for simulating large functions , 2006, math/0603328.
[13] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[14] S. Meyn,et al. Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.
[15] P. Glynn,et al. Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .
[16] Sean P. Meyn,et al. Most likely paths to error when estimating the mean of a reflected random walk , 2009, Perform. Evaluation.
[17] Ana Busic,et al. Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.
[18] A. Shwartz,et al. Stochastic approximations for finite-state Markov chains , 1990 .
[19] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[20] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[21] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.
[22] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[23] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[24] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[25] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[26] Ken R. Duffy,et al. Large deviation asymptotics for busy periods , 2014 .
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] Ana Busic,et al. On Matrix Momentum Stochastic Approximation and Applications to Q-learning , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[29] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[30] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[31] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[32] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[33] Sean P. Meyn,et al. TD-learning with exploration , 2011, IEEE Conference on Decision and Control and European Control Conference.
[34] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[35] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[36] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[37] S. Meyn,et al. Computable Bounds for Geometric Convergence Rates of Markov Chains , 1994 .
[38] C. Watkins. Learning from delayed rewards , 1989 .
[39] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[40] Sean P. Meyn,et al. Zap Q-Learning - A User's Guide , 2019, 2019 Fifth Indian Control Conference (ICC).
[41] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[42] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[43] Sean P. Meyn,et al. Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control , 2012, Autom..
[44] D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.
[45] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[46] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[47] M. Metivier,et al. Applications of a Kushner and Clark lemma to general classes of stochastic algorithms , 1984, IEEE Trans. Inf. Theory.
[48] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[49] Ana Busic,et al. Zap Q-Learning With Nonlinear Function Approximation , 2019, NeurIPS.
[50] Dimitri P. Bertsekas,et al. Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.
[51] Magnus Egerstedt,et al. Performance regulation and tracking via lookahead simulation: Preliminary results and validation , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[52] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[53] D. Ruppert. A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure , 1985 .
[54] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[55] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[56] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[57] K. Chung. On a Stochastic Approximation Method , 1954 .
[58] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[59] Shie Mannor,et al. Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, ArXiv.
[60] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[61] Sean P. Meyn,et al. Q-learning and Pontryagin's Minimum Principle , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[62] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[63] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[64] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[65] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.