Asymptotic expansions for dynamic programming recursions with general nonnegative matrices

AbstractThis paper is concerned with the study of the asymptotic behavior of dynamic programming recursions of the form $$x(n + 1) = \mathop {\max }\limits_{P \in \mathcal{K}} Px(n), n = 0,1,2,...,$$ where ℜ denotes a set of matrices, generated by all possible interchanges of corresponding rows, taken from a fixed finite set of nonnegative square matrices. These recursions arise in a number of well-known and frequently studied problems, e.g. in the theory of controlled Markov chains, Leontief substitution systems, controlled branching processes, etc. Results concerning the asymptotic behavior ofx(n), forn→∞, are established in terms of the maximal spectral radius, the maximal index, and a set of generalized eigenvectors. A key role in the analysis is played by a geometric convergence result for value iteration in undiscounted multichain Markov decision processes. A new proof of this result is also presented.

[1]  R. Bellman On a Quasi-Linear Equation , 1956, Canadian Journal of Mathematics.

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  M. C. Pease,et al.  Methods of Matrix Algebra , 1965 .

[4]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[5]  R. Bellman Dynamic programming. , 1957, Science.

[6]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[7]  E. Lanery,et al.  Étude asymptotique des systèmes markoviens à commande , 1967 .

[8]  E. Seneta,et al.  THE THEORY OF NON-NEGATIVE MATRICES IN A DYNAMIC PROGRAMMING PROBLEM , 1969 .

[9]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[10]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[11]  David A. Starrett,et al.  Mathematical Theories of Economic Growth , 1971 .

[12]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[13]  E. Denardo A Markov Decision Problem , 1973 .

[14]  Karel Sladký,et al.  On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[15]  U. Rothblum Algebraic eigenspaces of nonnegative matrices , 1975 .

[16]  Karel Sladký,et al.  On dynamic programming recursions for multiplicative Markov decision chains , 1976 .

[17]  S. Pliska Optimization of Multitype Branching Processes , 1976 .

[18]  P. Schweitzer,et al.  DISCOUNTED AND UNDISCOUNTED VALUE-ITERATION IN MARKOV DECISION PROBLEMS: A SURVEY , 1977 .

[19]  W. Zijm Nonnegative matrices in dynamic programming , 1979 .

[20]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[21]  Karel Sladký,et al.  Bounds on discrete dynamic programming recursions. I. Models with non-negative matrices , 1980, Kybernetika.

[22]  U. Rothblum Sensitive Growth Analysis of Multiplicative Systems I: The Dynamic Approach , 1981 .

[23]  J. van der Wal,et al.  Stochastic dynamic programming : successive approximations and nearly optimal strategies for markov decision processes and markov games , 1981 .

[24]  Uriel G. Rothblum,et al.  Expansions of Sums of Matrix Powers , 1981 .

[25]  Peter Whittle,et al.  Growth Optimality for Branching Markov Decision Chains , 1982, Math. Oper. Res..

[26]  W. Zijm R-theory for countable reducible nonnegative matrices , 1983 .

[27]  Uriel G. Rothblum,et al.  Multiplicative Markov Decision Chains , 1984, Math. Oper. Res..

[28]  Whm Henk Zijm Generalized eigenvectors and sets of nonnegative matrices , 1984 .

[29]  F. R. Gantmakher The Theory of Matrices , 1984 .

[30]  W. H. M. Zijm,et al.  Exponential Convergence in Undiscounted Continuous-Time Markov Decision Chains , 1987, Math. Oper. Res..