A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL
暂无分享,去创建一个
[1] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[2] Lennart Ljung,et al. Analysis of recursive stochastic algorithms , 1977 .
[3] Dimitri P. Bertsekas. Distributed Computation of Fixed Points. , 1981 .
[4] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[5] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[6] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] V. Borkar. Distributed computation of fixed points of ∞-nonexpansive maps , 1996 .
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Vivek S. Borkar,et al. Multiscale Stochastic Approximation for Parametric Optimization of Hidden Markov Models , 1997, Probability in the Engineering and Informational Sciences.
[11] V. Borkar. Stochastic approximation with two time scales , 1997 .
[12] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Vivek S. Borkar,et al. An analog scheme for fixed-point computation-Part II: Applications , 1999 .
[15] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[16] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[17] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..