Continuous-time Markov decision process with average reward: Using reinforcement learning method
暂无分享,去创建一个
[1] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[2] Manuela M. Veloso,et al. Decentralized MDPs with sparse interactions , 2011, Artif. Intell..
[3] J. Banks,et al. Discrete-Event System Simulation , 1995 .
[4] Xianping Guo,et al. Continuous-Time Markov Decision Processes: Theory and Applications , 2009 .
[5] Roger W. Brockett,et al. Optimal control of observable continuous time Markov chains , 2008, 2008 47th IEEE Conference on Decision and Control.
[6] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.
[7] Xi Chen,et al. Policy iteration based feedback control , 2008, Autom..
[8] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .