The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
暂无分享,去创建一个
[1] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[4] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[5] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[6] T. Sargent. Bounded rationality in macroeconomics , 1993 .
[7] S. Meyn,et al. Computable Bounds for Geometric Convergence Rates of Markov Chains , 1994 .
[8] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[9] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[10] J. Dai. On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach Via Fluid Limit Models , 1995 .
[11] Sean P. Meyn,et al. Stability and convergence of moments for multiclass queueing networks via fluid limit models , 1995, IEEE Trans. Autom. Control..
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[14] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[15] V. Borkar. Stochastic approximation with two time scales , 1997 .
[16] V. Borkar. Recursive self-tuning control of finite Markov chains , 1997 .
[17] V. Borkar,et al. An analog scheme for fixed point computation. I. Theory , 1997 .
[18] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[19] Ann Appl,et al. On the Positive Harris Recurrence for Multiclass Queueing Networks: a Uniied Approach via Uid Limit Models , 1999 .
[20] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[21] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..