论文信息 - The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

In this short paper we prove that the classic simplex method with the mostnegative-reduced-cost pivoting rule (Dantzig 1947) for solving the Markov decision problem (MDP) with a flxed discount rate is a strongly polynomial-time algorithm. The result seems surprising since this very pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, and the simplex (or simple policy iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP problem regardless of discount rates. As a corollary, the policy-iteration method (Howard 1960) is also a strongly polynomial-time algorithm for solving the MDP with a flxed discount rate.

Y. Ye

[1] George B. Dantzig,et al. Optimal Solution of a Dynamic Leontief Model with Substitution , 1955 .

[2] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[4] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[5] A. F. Veinott. Extreme points of leontief substitution systems , 1968 .

[6] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[7] Nesa L'abbe Wu,et al. Linear programming and extensions , 1981 .

[8] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[9] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[10] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[11] Robert E. Bixby,et al. Progress in Linear Programming , 1993 .