Multiplicative Markov Decision Chains

Previous treatments of multiplicative Markov decision chains eg., Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, New Jersey.], Mandl [Mandl, P. 1967. An iterative method for maximizing the characteristic root of positive matrices. Rev. Roumaine Math. Pures Appl.XII 1317--1322.], and Howard and Matheson [Howard, R. A., Matheson, J. E. 1972. Risk-sensitive Markov decision processes. Management Sci.8 356--369.] restricted attention to stationary policies and assumed that all transition matrices are irreducible and aperiodic. They also used a “first term” optimality criterion, namely maximizing the spectral radius of the associated transition matrix. We give a constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius. The algorithm for finding an optimal policy, first searches for a stationary policy with a nonnilpotent transition matrix, provided such a rule exists. Otherwise, the method still finds an optimal policy; though in this case the set of optimal policies usually does not contain a stationary policy! If a stationary policy with a nonnilpotent transition matrix exists, then we develop a policy improvement algorithm which finds a stationary optimal policy.

[1]  J. Gillis,et al.  Matrix Iterative Analysis , 1961 .

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[4]  Tosio Kato Perturbation theory for linear operators , 1966 .

[5]  Robert A. Pollak,et al.  Additive von Neumann-Morgenstern Utility Functions , 1967 .

[6]  E. Seneta,et al.  THE THEORY OF NON-NEGATIVE MATRICES IN A DYNAMIC PROGRAMMING PROBLEM , 1969 .

[7]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[8]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[9]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[10]  Karel Sladký,et al.  On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[11]  U. Rothblum Algebraic eigenspaces of nonnegative matrices , 1975 .

[12]  U. Rothblum Multivariate constant risk posture , 1975 .

[13]  Uriel G. Rothblum,et al.  Normalized Markov Decision Chains I; Sensitive Discount Optimality , 1975, Oper. Res..

[14]  Karel Sladký,et al.  On dynamic programming recursions for multiplicative Markov decision chains , 1976 .

[15]  S. Pliska Optimization of Multitype Branching Processes , 1976 .

[16]  U. Rothblum Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies , 1977 .

[17]  Whm Henk Zijm,et al.  Maximizing the growth of the utility vector in a dynamic programming model , 1979 .

[18]  W. Zijm Nonnegative matrices in dynamic programming , 1979 .

[19]  Karel Sladký,et al.  Bounds on discrete dynamic programming recursions. I. Models with non-negative matrices , 1980, Kybernetika.

[20]  U. Rothblum Sensitive Growth Analysis of Multiplicative Systems I: The Dynamic Approach , 1981 .

[21]  Peter Whittle,et al.  Growth Optimality for Branching Markov Decision Chains , 1982, Math. Oper. Res..