暂无分享,去创建一个
[1] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] Keith Ross,et al. On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning , 2020, ArXiv.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Yuanlong Chen,et al. On the convergence of optimistic policy iteration for stochastic shortest path problem , 2018, ArXiv.
[6] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[7] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[8] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .
[11] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] R. Ash,et al. Real analysis and probability , 1975 .
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 2004, Machine Learning.