论文信息 - Multiplicative Markov Decision Chains

Multiplicative Markov Decision Chains

Previous treatments of multiplicative Markov decision chains eg., Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, New Jersey.], Mandl [Mandl, P. 1967. An iterative method for maximizing the characteristic root of positive matrices. Rev. Roumaine Math. Pures Appl.XII 1317--1322.], and Howard and Matheson [Howard, R. A., Matheson, J. E. 1972. Risk-sensitive Markov decision processes. Management Sci.8 356--369.] restricted attention to stationary policies and assumed that all transition matrices are irreducible and aperiodic. They also used a “first term” optimality criterion, namely maximizing the spectral radius of the associated transition matrix. We give a constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius. The algorithm for finding an optimal policy, first searches for a stationary policy with a nonnilpotent transition matrix, provided such a rule exists. Otherwise, the method still finds an optimal policy; though in this case the set of optimal policies usually does not contain a stationary policy! If a stationary policy with a nonnilpotent transition matrix exists, then we develop a policy improvement algorithm which finds a stationary optimal policy.

Uriel G. Rothblum | U. Rothblum

[1] J. Gillis,et al. Matrix Iterative Analysis , 1961 .

[2] R. Bellman. Dynamic programming. , 1957, Science.

[3] A. F. Veinott. ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[4] Tosio Kato. Perturbation theory for linear operators , 1966 .

[5] Robert A. Pollak,et al. Additive von Neumann-Morgenstern Utility Functions , 1967 .

[6] E. Seneta,et al. THE THEORY OF NON-NEGATIVE MATRICES IN A DYNAMIC PROGRAMMING PROBLEM , 1969 .

[7] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[8] B. L. Miller,et al. Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[9] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .

[10] Karel Sladký,et al. On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[11] U. Rothblum. Algebraic eigenspaces of nonnegative matrices , 1975 .