The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

In this short paper we prove that the classic simplex method with the mostnegative-reduced-cost pivoting rule (Dantzig 1947) for solving the Markov decision problem (MDP) with a flxed discount rate is a strongly polynomial-time algorithm. The result seems surprising since this very pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, and the simplex (or simple policy iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP problem regardless of discount rates. As a corollary, the policy-iteration method (Howard 1960) is also a strongly polynomial-time algorithm for solving the MDP with a flxed discount rate.

[1]  George B. Dantzig,et al.  Optimal Solution of a Dynamic Leontief Model with Substitution , 1955 .

[2]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[5]  A. F. Veinott Extreme points of leontief substitution systems , 1968 .

[6]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[7]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[9]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[10]  P. Tseng Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[11]  Robert E. Bixby,et al.  Progress in Linear Programming , 1993 .

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Anne Condon,et al.  On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes , 1994, INFORMS J. Comput..

[14]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[15]  Yinyu Ye,et al.  A primal-dual interior point method whose running time depends only on the constraint matrix , 1996, Math. Program..

[16]  Yishay Mansour,et al.  On the Complexity of Policy Iteration , 1999, UAI.

[17]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[18]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[19]  John Fearnley,et al.  Exponential Lower Bounds for Policy Iteration , 2010, ICALP.

[20]  U. Rieder,et al.  Markov Decision Processes , 2010 .