论文信息 - LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES

LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly. We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

M. Kurano

[1] Patrick Billingsley,et al. Statistical inference for Markov processes , 1961 .

[2] Michel Loève,et al. Probability Theory I , 1977 .

[3] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[4] K. M. vanHee,et al. Bayesian control of Markov chains , 1978 .

[5] Thomas R. Jefferson,et al. On the analysis of an information theoretic model of spatial interaction , 1980, Inf. Sci..

[6] S. Lakshmivarahan,et al. Learning Algorithms Theory and Applications , 1981 .

[7] P. Schweitzer,et al. Nonstationary Markov decision problems with converging parameters , 1981 .

[8] S. Lakshmivarahan,et al. varepsilon-Optimality of a general class of learning algorithms , 1982, Inf. Sci..

[9] Masami Kurano. Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices , 1983 .

[10] S. Marcus,et al. Adaptive control of discounted Markov decision chains , 1985 .