Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] A. Jalali,et al. Adaptive control of Markov chains with local updates , 1990 .
[7] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[8] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[9] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[11] B. Pasik-Duncan,et al. Adaptive Control , 1996, IEEE Control Systems.