Actor-Critic - Type Learning Algorithms for Markov Decision Processes
暂无分享,去创建一个
[1] F. Wilson,et al. Smoothing derivatives of functions and applications , 1969 .
[2] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[3] V. Nollau. Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .
[4] P. Kokotovic. Applications of Singular Perturbation Techniques to Control Problems , 1984 .
[5] Schäl Manfred. Estimation and control in discounted stochastic dynamic programming , 1987 .
[6] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[7] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .
[8] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] B Ravindran,et al. A tutorial survey of reinforcement learning , 1994 .
[11] Vivek S. Borkar,et al. The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .
[12] V. Borkar. Stochastic approximation with two time scales , 1997 .
[13] V. Borkar. Recursive self-tuning control of finite Markov chains , 1997 .
[14] P. S. Sastry,et al. A reinforcement learning neural network for adaptive control of Markov chains , 1997, IEEE Trans. Syst. Man Cybern. Part A.
[15] V. Borkar,et al. Stability and convergence of stochastic approximation using the ODE method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[16] D. Bertsekas. A New Value Iteration method for the Average Cost Dynamic Programming Problem , 1998 .
[17] V. Borkar. Asynchronous Stochastic Approximations , 1998 .