A policy gradient method for SMDPs with application to call admission control
暂无分享,去创建一个
[1] V. Tadić. Almost sure convergence of two time-scale stochastic approximation algorithms , 2004, Proceedings of the 2004 American Control Conference.
[2] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[3] W. D. Ray,et al. Stochastic Models: An Algorithmic Approach , 1995 .
[4] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Keith W. Ross,et al. Multiservice Loss Models for Broadband Telecommunication Networks , 1997 .
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.