A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case (Theory and Application of Decision Analysis in Uncertain Situation)
暂无分享,去创建一个
[1] J. J. Martin. Bayesian Decision Problems and Markov Chains , 1967 .
[2] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.
[3] S. Lakshmivarahan,et al. Learning Algorithms Theory and Applications , 1981 .
[4] Arie Leizarowitz,et al. An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes , 2003, Math. Oper. Res..
[5] S. Marcus,et al. Adaptive control of discounted Markov decision chains , 1985 .
[6] J. Bather. Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.
[7] M. Kurano,et al. Temporal Difference-based Adaptive policies in Neuro-dynamic Programming , 2007 .
[8] Masayuki Horiguchi,et al. A structured pattern matrix algorithm for multichain Markov decision processes , 2005, Math. Methods Oper. Res..
[9] P. Schweitzer,et al. Nonstationary Markov decision problems with converging parameters , 1981 .
[10] Abraham Thomas,et al. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .
[11] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[12] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .