论文信息 - Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.

O. Hernández-Lerma | R. Cavazos-Cadena

[1] F. G. Foster. On the Stochastic Matrices Associated with Certain Queuing Processes , 1953 .

[2] Michel Loève,et al. Probability Theory I , 1977 .

[3] K. Hinderer,et al. Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[4] R. Ash,et al. Real analysis and probability , 1975 .

[5] Petr Mandl. On the adaptive control of countable Markov chains , 1979 .

[6] Michael Kolonko,et al. The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter , 1982 .

[7] O. Hernández-Lerma. Adaptive Markov Control Processes , 1989 .

[8] R. Cavazos-Cadena. Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state , 1992 .