论文信息 - A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case (Theory and Application of Decision Analysis in Uncertain Situation)

A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case (Theory and Application of Decision Analysis in Uncertain Situation)

In this note, as a sequel to our previous $work[\eta$ , we are concerned with adaptive models for uncertain Markov decision processes with regularly communicating structure where the state space is decomposed into a single communicating class and a absolutely transient class. We give a pattern-matrix learning algorithm which finds the regularly communicating structure, by which an asymptotic sequence of adaptive properties $w\cdot ith$ nearly average-optimal properties is constructed. A numerical experiment is given.optimal adaptive policy, regularly communicating case.

Masami Yasuda | Masayuki Horiguchi | Masami Kurano | Tetsuichiro Iki

[1] J. J. Martin. Bayesian Decision Problems and Markov Chains , 1967 .

[2] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[3] S. Lakshmivarahan,et al. Learning Algorithms Theory and Applications , 1981 .

[4] Arie Leizarowitz,et al. An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes , 2003, Math. Oper. Res..

[5] S. Marcus,et al. Adaptive control of discounted Markov decision chains , 1985 .

[6] J. Bather. Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.

[7] M. Kurano,et al. Temporal Difference-based Adaptive policies in Neuro-dynamic Programming , 2007 .

[8] Masayuki Horiguchi,et al. A structured pattern matrix algorithm for multichain Markov decision processes , 2005, Math. Methods Oper. Res..

[9] P. Schweitzer,et al. Nonstationary Markov decision problems with converging parameters , 1981 .

[10] Abraham Thomas,et al. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .

[11] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .

[12] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .