Learning Algorithms for Markov Decision Processes with Average Cost
暂无分享,去创建一个
[1] F. Wilson,et al. Smoothing derivatives of functions and applications , 1969 .
[2] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[3] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[4] Henk Tijms,et al. Stochastic modelling and analysis: a computational approach , 1986 .
[5] A. Jalali,et al. Adaptive control of Markov chains with local updates , 1990 .
[6] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[7] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[8] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[9] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] C. SIAMJ.. A NEW VALUE ITERATION METHOD FOR THE AVERAGE COST DYNAMIC PROGRAMMING PROBLEM∗ , 1995 .
[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[13] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[16] V. Borkar. Stochastic approximation with two time scales , 1997 .
[17] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[18] V. Borkar. Recursive self-tuning control of finite Markov chains , 1997 .
[19] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[20] Dimitri P. Bertsekas,et al. Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.
[21] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[22] V. Borkar. A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL , 2000, Probability in the Engineering and Informational Sciences.
[23] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[24] J. Walrand,et al. Distributed Dynamic Programming , 2022 .