We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.
[1]
F. G. Foster.
On the Stochastic Matrices Associated with Certain Queuing Processes
,
1953
.
[2]
Michel Loève,et al.
Probability Theory I
,
1977
.
[3]
K. Hinderer,et al.
Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter
,
1970
.
[4]
R. Ash,et al.
Real analysis and probability
,
1975
.
[5]
Petr Mandl.
On the adaptive control of countable Markov chains
,
1979
.
[6]
Michael Kolonko,et al.
The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter
,
1982
.
[7]
O. Hernández-Lerma.
Adaptive Markov Control Processes
,
1989
.
[8]
R. Cavazos-Cadena.
Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state
,
1992
.