Policy Gradient Semi-markov Decision Process
暂无分享,去创建一个
[1] G. T. Bartha. Advantage updating on a mobile robot , 1994, Proceedings of PerAc '94. From Perception to Action.
[2] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[7] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[12] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[15] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.