暂无分享,去创建一个
[1] T. Sideris. Ordinary Differential Equations and Dynamical Systems , 2013 .
[2] L. Breuer. Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.
[3] V. Borkar. Stochastic approximation with two time scales , 1997 .
[4] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[5] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[6] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[7] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[8] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[9] Vivek S. Borkar,et al. The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .
[10] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[11] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.
[12] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[15] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[16] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[17] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[18] A. McNabb. Comparison theorems for differential equations , 1986 .
[19] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[21] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[22] Konstantinos Spiliopoulos,et al. Asymptotics of Reinforcement Learning with Neural Networks , 2021, Stochastic Systems.
[23] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .