Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory

This paper gives an overview of linear programming methods for solving standard and nonstandard Markovian control problems. Standard problems are problems with the usual criteria such as expected total (discounted) rewards and average expected rewards; we also discuss a particular class of stochastic games. In nonstandard problems there are additional considerations as side constraints, multiple criteria or mean-variance tradeoffs. In a second companion paper efficient linear programing algorithms are discussed for some applications.

[1]  R. B. Kulkarni,et al.  Linear programming formulations of Markov decision processes , 1986 .

[2]  Awi Federgruen,et al.  A survey of asymptotic value-iteration for undiscounted markovian decision processes : (preprint) , 1979 .

[3]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[4]  T. E. S. Raghavan,et al.  Algorithms for stochastic games — A survey , 1991, ZOR Methods Model. Oper. Res..

[5]  J. Stein On efficiency of linear programming applied to discounted Markovian decision problems , 1988 .

[6]  Elon Kohlberg,et al.  On Stochastic Games with Stationary Optimal Strategies , 1978, Math. Oper. Res..

[7]  D. Blackwell Discrete Dynamic Programming , 1962 .

[8]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[9]  E. Denardo A Markov Decision Problem , 1973 .

[10]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[11]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[12]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[13]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[14]  Jerzy Andrzej Filar,et al.  Algorithms for solving some undiscounted stochastic games , 1980 .

[15]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..

[16]  J. A. Bather Markovian Decision Processes , 1971 .

[17]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[18]  J. Filar,et al.  Gain/variability tradeoffs in undiscounted Markov decision processes , 1985, 1985 24th IEEE Conference on Decision and Control.

[19]  J. A. E. E. van Nunen,et al.  Contracting Markov decision processes , 1978 .

[20]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[21]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[22]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[23]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[24]  T. Parthasarathy,et al.  An orderfield property for stochastic games when one player controls transition probabilities , 1981 .

[25]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[26]  J. Filar,et al.  Communicating MDPs: Equivalence and LP properties , 1988 .

[27]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[28]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[29]  Rommert Dekker,et al.  Sensitivity-analysis in discounted Markovian decision problems , 1985 .

[30]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[31]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[32]  G. D. Eppen,et al.  Linear Programming Solutions for Separable Markovian Decision Problems , 1967 .

[33]  Jerzy A. Filar,et al.  Multiobjective Markov decision process with average reward criterion , 1986 .

[34]  Dean Gillette,et al.  9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[35]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[36]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[37]  J Jaap Wessels,et al.  Discounted semi-Markov decision processes : linear programming and policy iteration , 1975 .

[38]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[39]  Eric V. Denardo,et al.  Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[40]  A. Hordijk,et al.  Linear Programming Methods for Solving Finite Markovian Decision Problems , 1981 .

[41]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[42]  Wolf-Rüdiger Heilmann,et al.  Solving stochastic dynamic programming problems by linear programming — An annotated bibliography , 1978, Z. Oper. Research.

[43]  Arie Hordijk,et al.  Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints , 1984, Math. Program..

[44]  E. Denardo On Linear Programming in a Markov Decision Problem , 1970 .

[45]  H. Kawai A variance minimization problem for a Markov decision process , 1987 .

[46]  O. J. Vrieze Linear programming and undiscounted stochastic games in which one player controls transitions , 1981 .

[47]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[48]  H. Mine,et al.  Linear programming considerations on Markovian Decision Processes with no discounting , 1969 .

[49]  W. R. S. Sutherland,et al.  Optimality in transient markov chains and linear programming , 1980, Math. Program..

[50]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[51]  L. C. M. Kallenberg,et al.  Linear Programming to Compute a Bias-Optimal Policy , 1982 .

[52]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[53]  B. L. Miller,et al.  An Optimality Condition for Discrete Dynamic Programming with no Discounting , 1968 .

[54]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .