论文信息 - Adaptive control of discounted Markov decision chains

Adaptive control of discounted Markov decision chains

In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).

S. Marcus | O. Hernández-Lerma

[1] K. Hinderer,et al. Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[2] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[3] Manfred SchÄl,et al. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[5] J. P. Georgin,et al. Estimation et controle des chaines de Markov sur des espaces arbitraires , 1978 .

[6] L. Ljung. Convergence analysis of parametric identification methods , 1978 .

[7] Michael Kolonko,et al. Optimal Control of Semi-Markov Chains under Uncertainty with Applications to Queueing Models , 1980 .

[8] Recursive algorithms of adaptive control in stochastic systems , 1981, Cybernetics.

[9] P. Schweitzer,et al. Nonstationary Markov decision problems with converging parameters , 1981 .

[10] M. Kolonko. Strongly consistent estimation in a controlled Markov renewal model , 1982 .

[11] Michael Kolonko,et al. The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter , 1982 .

[12] Steven I. Marcus,et al. Adaptive control of service in queueing systems , 1983 .

[13] Steven I. Marcus,et al. Optimal adaptive control of priority assignment in queueing systems , 1984 .