Finite State and Action MDPS

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite horizon models. For the finite horizon model the utility function of the total expected reward is commonly used. For the infinite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects.

[1]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[2]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  Cyrus Derman,et al.  Replacement of periodically inspected equipment. (An optimal optional stopping rule) , 1960 .

[5]  H. Robbins,et al.  A Martingale System Theorem and Applications , 1961 .

[6]  T. L. Saaty,et al.  Progress in Operations Research. , 1961 .

[7]  L. A. Zadeh,et al.  Optimal Pursuit Strategies in Discrete-State Probabilistic Systems , 1962 .

[8]  W. Jewell MARKOV-RENEWAL PROGRAMMING , 1962 .

[9]  M. Klein Inspection—Maintenance—Replacement Schedules Under Markovian Deterioration , 1962 .

[10]  D. Blackwell Discrete Dynamic Programming , 1962 .

[11]  C. Derman On Sequential Decisions and Markov Chains , 1962 .

[12]  D. White Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[13]  J. Neyman,et al.  Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , 1963 .

[14]  William S. Jewell,et al.  Markov-Renewal Programming. I: Formulation, Finite Return Models , 1963 .

[15]  W. Jewell Markov-Renewal Programming. II: Infinite Return Models, Example , 1963 .

[16]  D. Iglehart Optimality of (s, S) Policies in the Infinite Horizon Dynamic Inventory Problem , 1963 .

[17]  George Pâolya,et al.  Applied Combinatorial Mathematics , 1964 .

[18]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[19]  S. Karlin,et al.  Mathematical Methods in the Social Sciences , 1962 .

[20]  J. S. D. Cani A Dynamic Programming Algorithm for Embedded Markov Chains when the Planning Horizon is at Infinity , 1964 .

[21]  R. Bellman Mathematical optimization techniques , 1964 .

[22]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[23]  P. Schweitzer Perturbation theory and Markovian decision processes. , 1965 .

[24]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[25]  W. Barry On the Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process , 1965 .

[26]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[27]  P. Kolesar Minimum Cost Replacement Under Markovian Deterioration , 1966 .

[28]  R. Bellman Dynamic programming. , 1957, Science.

[29]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[30]  J. MacQueen A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[31]  Richard D. Smallwood,et al.  Optimum Policy Regions for Markov Processes with Discounting , 1966, Oper. Res..

[32]  R. Strauch,et al.  A PROPERTY OF SEQUENTIAL CONTROL PROCESSES , 1966 .

[33]  Jr. Arthur F. Veinott On the Opimality of $( {s,S} )$ Inventory Policies: New Conditions and a New Proof , 1966 .

[34]  G. D. Eppen,et al.  Linear Programming Solutions for Separable Markovian Decision Problems , 1967 .

[35]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[36]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[37]  H. Mine,et al.  Linear programming algorithms for semi-Markovian decision processes , 1968 .

[38]  B. L. Miller,et al.  An Optimality Condition for Discrete Dynamic Programming with no Discounting , 1968 .

[39]  N. A. J. Hastings,et al.  Some Notes on Dynamic Programming and Replacement , 1968 .

[40]  Linus Schrage,et al.  Letter to the Editor - A Proof of the Optimality of the Shortest Remaining Processing Time Discipline , 1968, Oper. Res..

[41]  B. Fox (g, w)—Optima in Markov Renewal Programs , 1968 .

[42]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[43]  E. Denardo Separable Markovian Decision Problems , 1968 .

[44]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[45]  N. A. J. Hastings,et al.  Optimization of Discounted Markov Decision Problems , 1969 .

[46]  M. Pollatschek,et al.  Algorithms for Stochastic Games with Geometrical Interpretation , 1969 .

[47]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[48]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[49]  Sheldon M. Ross,et al.  A Problem in Optimal Search and Stop , 1969, Oper. Res..

[50]  N. Hastings The Repair Limit Replacement Method , 1969 .

[51]  Amedeo R. Odoni,et al.  On Finding the Maximal Gain for Markov Decision Processes , 1969, Oper. Res..

[52]  Steven A. Lippman,et al.  Letter to the Editor - Criterion Equivalence in Discrete Dynamic Programming , 1969, Oper. Res..

[53]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[54]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[55]  Eric V. Denardo,et al.  Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[56]  Harold J. Kushner,et al.  Accelerated procedures for the solution of discrete Markov control problems , 1971 .

[57]  Arie Hordijk,et al.  A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in markovian decision processes : Prepublication , 1971 .

[58]  P. Schweitzer Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[59]  Paul J. Schweitzer Multiple Policy Improvements in Undiscounted Markov Renewal Programming , 1971, Oper. Res..

[60]  N. A. J. Hastings Technical Note - Bounds on the Gain of a Markov Decision Process , 1971, Oper. Res..

[61]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[62]  E. Denardo Markov Renewal Programs with Small Interest Rates , 1971 .

[63]  Thomas E. Morton Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[64]  H. Kushner Introduction to stochastic control , 1971 .

[65]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[66]  J. Bather Optimal decision procedures for finite Markov chains. Part III: General convex systems , 1973 .

[67]  Milton C. Chew Optimal Stopping in a Discrete Search Problem , 1973, Oper. Res..

[68]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[69]  N. Hastings,et al.  Tests for Suboptimal Actions in Discounted Markov Programming , 1973 .

[70]  J. Bather Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.

[71]  Edward P. C. Kao,et al.  Optimal Replacement Rules when Changes of State are Semi-Markovian , 1973, Oper. Res..

[72]  E. Denardo A Markov Decision Problem , 1973 .

[73]  Richard C. Grinold,et al.  Technical Note - Elimination of Suboptimal Actions in Markov Decision Problems , 1973, Oper. Res..

[74]  Dieter Reetz,et al.  Solution of a Markovian decision problem by successive overrelaxation , 1973, Z. Oper. Research.

[75]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[76]  Arie Hordijk,et al.  Technical Note - The Method of Successive Approximations and Markovian Decision Problems , 1974, Oper. Res..

[77]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[78]  Karel Sladký,et al.  On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[79]  Helmut Schellhaas,et al.  Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung , 1974, Z. Oper. Research.

[80]  Sheldon M. Ross,et al.  Dynamic programming and gambling models , 1974, Advances in Applied Probability.

[81]  J Jaap Wessels,et al.  Discounted semi-Markov decision processes : linear programming and policy iteration , 1975 .

[82]  J. Wessels,et al.  A principle for generating optimization procedures for discounted Markov decision processes , 1974 .

[83]  David Michael Burley,et al.  Studies in optimization , 1974 .

[84]  A. Hordijk,et al.  A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING , 1975 .

[85]  Evan L. Porteus Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[86]  J. Gani,et al.  Progress in statistics , 1975 .

[87]  J. Shapiro Brouwer's fixed point theorem and finite state space Markovian decision theory , 1975 .

[88]  A. Hordijk,et al.  On a Conjecture of Iglehart , 1975 .

[89]  Jo van Nunen,et al.  A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[90]  Dimitri P. Bertsekas,et al.  On error bounds for successive approximation methods , 1976 .

[91]  N. Hastings,et al.  Note---A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains , 1976 .

[92]  J.A.E.E. van Nunen,et al.  The action elimination algorithm for Markov decision processes , 1976 .

[93]  Chelsea C. White,et al.  Procedures for the Solution of a Finite-Horizon, Partially Observed, Semi-Markov Optimization Problem , 1976, Oper. Res..

[94]  J. A. E. E. van Nunen Contracting Markov decision processes , 1976 .

[95]  Dieter Reetz,et al.  A decision exclusion algorithm for a class of Markovian Decision Processes , 1976, Math. Methods Oper. Res..

[96]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[97]  Michael Scriabin,et al.  Maintenance Scheduling for Multicomponent Equipment , 1977 .

[98]  G. Hübner Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties , 1977 .

[99]  P. Schweitzer,et al.  DISCOUNTED AND UNDISCOUNTED VALUE-ITERATION IN MARKOV DECISION PROBLEMS: A SURVEY , 1977 .

[100]  van der J Jan Wal,et al.  Successive approximations for convergent dynamic programming , 1977 .

[101]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[102]  Loren Platzman,et al.  Technical Note - Improved Conditions for Convergence in Undiscounted Markov Renewal Programming , 1977, Oper. Res..

[103]  Paul J. Schweitzer,et al.  The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems , 1977, Math. Oper. Res..

[104]  D. White ELIMINATION OF NON-OPTIMAL ACTIONS IN MARKOV DECISION PROCESSES , 1978 .

[105]  P. Schweitzer,et al.  Foolproof convergence in multichain Policy Iteration , 1978 .

[106]  Evan L. Porteus,et al.  Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..

[107]  Paul J. Schweitzer,et al.  The Functional Equations of Undiscounted Markov Renewal Programming , 1971, Math. Oper. Res..

[108]  Kees M. van Hee,et al.  Markov Strategies in Dynamic Programming , 1978, Math. Oper. Res..

[109]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[110]  P. Schweitzer Contraction mappings underlying undiscounted Markov decision problems—II , 1978 .

[111]  Donald R. Smith Optimal Repairman Allocation—Asymptotic Results , 1978 .

[112]  Martin L. Puterman,et al.  Contracting Markov Decision Processes. (Mathematical Centre Tract 71.) , 1978 .

[113]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[114]  Martin L. Puterman,et al.  Dynamic Programming and Its Application , 1979 .

[115]  J Jaap Wessels,et al.  Markov Decision Theory , 1979 .

[116]  Uriel G. Rothblum,et al.  Overtaking Optimality for Markov Decision Chains , 1979, Math. Oper. Res..

[117]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[118]  S. Christian Albright,et al.  Structural Results for Partially Observable Markov Decision Processes , 1979, Oper. Res..

[119]  N. Hastings,et al.  Markov programming with policy constraints , 1979 .

[120]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[121]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..

[122]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[123]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[124]  Awi Federgruen,et al.  A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs , 1980 .

[125]  J. Wal The method of value oriented successive approximations for the average reward Markov decision process , 1980 .

[126]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[127]  Evan L. Porteus Improved iterative computation of the expected discounted return in Markov and semi-Markov chains , 1980, Z. Oper. Research.

[128]  Nagata Furukawa,et al.  Characterization of Optimal Policies in Vector-Valued Markovian Decision Processes , 1980, Math. Oper. Res..

[129]  Anthony Ephremides,et al.  A simple dynamic routing problem , 1980 .

[130]  K. Ohno A UNIFIED APPROACH TO ALGORITHMS WITH A SUBOPTIMALITY TEST IN DISCOUNTED SEMI-MARKOV DECISION PROCESSES , 1981 .

[131]  Dieter Spreen,et al.  A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs , 1981, Z. Oper. Research.

[132]  L. Thomas Second order bounds for Markov Decision Processes , 1981 .

[133]  Evan L. Porteus Computing the discounted return in markov and semi‐markov chains , 1981 .

[134]  Martin L. Puterman,et al.  Computational methods for Markov decision processes , 1981 .

[135]  Greg N. Frederickson,et al.  Sequencing Tasks with Exponential Service Times to Minimize the Expected Flow Time or Makespan , 1981, JACM.

[136]  Matthew J. Sobel,et al.  Myopic Solutions of Markov Decision Processes and Stochastic Games , 1981, Oper. Res..

[137]  Y. S. Sherif,et al.  Optimal maintenance models for systems subject to failure–A Review , 1981 .

[138]  Martin L. Puterman,et al.  Action Elimination Procedures for Modified Policy Iteration Algorithms , 1982, Oper. Res..

[139]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[140]  R. Weber Scheduling jobs by stochastic processing requirements on parallel machines to minimize makespan or flowtime , 1982, Journal of Applied Probability.

[141]  Mohammad Roosta,et al.  Routing through a network with maximum reliability , 1982 .

[142]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[143]  Gideon Weiss,et al.  Multiserver Stochastic Scheduling , 1982 .

[144]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[145]  L. C. M. Kallenberg,et al.  Linear Programming to Compute a Bias-Optimal Policy , 1982 .

[146]  Awi Federgruen,et al.  Markovian control problems : functional equations and algorithms , 1983 .

[147]  M. I. Henig Vector-Valued Dynamic Programming , 1983 .

[148]  R. Hartley,et al.  Optimisation Over Time: Dynamic Programming and Stochastic Control: , 1983 .

[149]  Volker Nollau,et al.  Markov decision problems with countable state spaces : optimality criteria, algorithms, clustering , 1983 .

[150]  Masami Kurano Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices , 1983 .

[151]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[152]  Kevin Mahon,et al.  Deterministic and Stochastic Scheduling , 1983 .

[153]  Diethard Pallaschke,et al.  Selected Topics in Operations Research and Mathematical Economics , 1984 .

[154]  Arie Hordijk,et al.  Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints , 1984, Math. Program..

[155]  Ulrich D. Holzbaur,et al.  Entscheidungsmodelle über angeordneten Körpern , 1984 .

[156]  Shmuel Gal An $O(N^3 )$ Algorithm for Optimal Replacement Problems , 1984 .

[157]  P. Schweitzer,et al.  A Fixed Point Approach to Undiscounted Markov Renewal Programs , 1984 .

[158]  Paul J. Schweitzer,et al.  Successive Approximation Methods for Solving Nested Functional Equations in Markov Decision Problems , 1984, Math. Oper. Res..

[159]  P. R. Kumar,et al.  Optimal control of a queueing system with two heterogeneous servers , 1984 .

[160]  Moshe Haviv,et al.  Truncated policy iteration methods , 1984 .

[161]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[162]  Awi Federgruen,et al.  An Efficient Algorithm for Computing Optimal (s, S) Policies , 1984, Oper. Res..

[163]  Michael N. Katehakis,et al.  Optimal Repair Allocation in a Series System , 1984, Math. Oper. Res..

[164]  Paul J. Schweitzer,et al.  A value-iteration scheme for undiscounted multichain Markov renewal programs , 1984, Z. Oper. Research.

[165]  Norbert J. Schmitz,et al.  How good is Howard's policy improvement algorithm? , 1985, Z. Oper. Research.

[166]  D. J. White,et al.  Real Applications of Markov Decision Processes , 1985 .

[167]  Armand M. Makowski,et al.  K competing queues with geometric service requirements and linear costs: The μc-rule is always optimal☆ , 1985 .

[168]  Jr. Shaler Stidham Optimal control of admission to a queueing system , 1985 .

[169]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[170]  Rommert Dekker,et al.  Sensitivity-analysis in discounted Markovian decision problems , 1985 .

[171]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[172]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[173]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .

[174]  Michael N. Katehakis,et al.  Linear Programming for Finite State Multi-Armed Bandit Problems , 1986, Math. Oper. Res..

[175]  Lodewijk C. M. Kallenberg,et al.  A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index , 1986, Math. Oper. Res..

[176]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[177]  Jerzy A. Filar,et al.  Multiobjective Markov decision process with average reward criterion , 1986 .

[178]  U. Meister,et al.  A polynomial time bound for Howard's policy improvement algorithm , 1986 .

[179]  Lyn C. Thomas,et al.  Computational comparison of policy iteration algorithms for discounted markov decision processes , 1986, Comput. Oper. Res..

[180]  R. B. Kulkarni,et al.  Linear programming formulations of Markov decision processes , 1986 .

[181]  Henk Tijms,et al.  Stochastic modelling and analysis: a computational approach , 1986 .

[182]  U. Holzbaur Sensitivitätsanalysen in entscheidungsmodellen 1 , 1986 .

[183]  K.-J. Bierth An expected average reward criterion , 1987 .

[184]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[185]  H. Kawai A variance minimization problem for a Markov decision process , 1987 .

[186]  R. Weber,et al.  Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.

[187]  VARIANCE CONSTRAINED MARKOV DECISION PROCESS , 1987 .

[188]  O. J. Vrieze,et al.  Stochastic Games with Finite State and Action Spaces. , 1988 .

[189]  William S. Lovejoy,et al.  Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[190]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[191]  P. Schweitzer A Brouwer fixed-point mapping approach to communicating Markov decision processes , 1987 .

[192]  D. J. White,et al.  Further Real Applications of Markov Decision Processes , 1988 .

[193]  M. Yasuda The optimal value of markov stopping problems with one-step look ahead policy , 1988, Journal of Applied Probability.

[194]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[195]  Süleyman Özekici Optimal Periodic Replacement of Multicomponent Reliability Systems , 1988, Oper. Res..

[196]  J. Ben Atkinson,et al.  An Introduction to Queueing Networks , 1988 .

[197]  G. Hübner A unified approach to adaptive control of average reward Markov decision processes , 1988 .

[198]  Gideon Weiss,et al.  Branching Bandit Processes , 1988, Probability in the Engineering and Informational Sciences.

[199]  M. K rn,et al.  Stochastic Optimal Control , 1988 .

[200]  J. Stein On efficiency of linear programming applied to discounted Markovian decision problems , 1988 .

[201]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[202]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[203]  Michael N. Katehakis,et al.  On the maintenance of systems composed of highly reliable components , 1989 .

[204]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[205]  F. A. van der Duyn Schouten,et al.  Analysis and computation of (n,N) : Strategies for maintenance of a two-component system , 1989 .

[206]  Kun-Jen Chung A note on maximal mean/standard deviation ratio in an undiscounted MDP , 1989 .

[207]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[208]  M. K. Ghosh Markov decision processes with multiple costs , 1990 .

[209]  Sjur Didrik Flåm,et al.  A bisection/successive approximation method for computing Gittins indices , 1990, ZOR Methods Model. Oper. Res..

[210]  D. Preßmar,et al.  Operations research proceedings , 1990 .

[211]  M. Puterman,et al.  An improved algorithm for solving communicating average reward Markov decision processes , 1991 .

[212]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[213]  R. W. Owen,et al.  New results for generalized bandit problems , 1991 .

[214]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[215]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[216]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[217]  A. Shwartz,et al.  Adaptive control of constrained Markov chains , 1991 .

[218]  Charles J. Colbourn Combinatorial aspects of network reliability , 1991, Ann. Oper. Res..

[219]  Refael Hassin Multiterminal xcut problems , 1991, Ann. Oper. Res..

[220]  Steven I. Marcus,et al.  On the computation of the optimal cost function for discrete time Markov models with partial observations , 1991, Ann. Oper. Res..

[221]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[222]  Awi Federgruen,et al.  Finding Optimal (s, S) Policies Is About As Simple As Evaluating a Single Policy , 1991, Oper. Res..

[223]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[224]  Ulrich Rieder,et al.  Structural results for partially observed control models , 1991, ZOR Methods Model. Oper. Res..

[225]  R. Cavazos-Cadena Nonparametric estimation and adaptive control in a class of finite Markov decision chains , 1991 .

[226]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[227]  Paul J. Schweitzer,et al.  Block-scaling of value-iteration for discounted Markov renewal programming , 1991, Ann. Oper. Res..

[228]  K. Wakuta Optimal stationary policies in the vector-valued Markov decision process , 1992 .

[229]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[230]  Keith W. Ross,et al.  Variability Sensitive Markov Decision Processes , 1992, Math. Oper. Res..

[231]  K. Ohno,et al.  Multiobjective undiscounted Markov renewal program and its application to a tool replacement problem in an FMS , 1992 .

[232]  Kun-Jen Chung Remarks on maximal meanstandard devition ratio in undiscounted mdps , 1992 .

[233]  K. Ohno,et al.  Multi-objective discounted Markov decision processes with expectation and variance criteria , 1992 .

[234]  L. Kallenberg Separable Markovian decision problems , 1992 .

[235]  D. J. White Computational approaches to variance-penalised Markov decision processes , 1992 .

[236]  E. Frostig Optimal policies for machine repairmen problems , 1993 .

[237]  M. Sun Revised simplex algorithm for finite Markov decision processes , 1993 .

[238]  Shaler Stidham,et al.  A survey of Markov decision models for control of networks of queues , 1993, Queueing Syst. Theory Appl..

[239]  D. J. White,et al.  A Survey of Applications of Markov Decision Processes , 1993 .

[240]  Dimitris Bertsimas,et al.  Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems , 2011, IPCO.

[241]  U. Holzbaur Bounds for the quality and the number of steps in Bellman's value iteration algorithm , 1994 .

[242]  U. Yechiali,et al.  Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis , 1994 .

[243]  J. Lasserre A new policy iteration scheme for Markov decision processes using Schweitzer's formula , 1994, Journal of Applied Probability.

[244]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[245]  Kun-Jen Chung Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case , 1994, Oper. Res..

[246]  Chelsea C. White,et al.  Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..

[247]  P. Varaiya,et al.  Multi-Armed bandit problem revisited , 1994 .

[248]  Arie Hordijk,et al.  Undiscounted Markov decision chains with partial information; an algorithm for computing a locally optimal periodic policy , 1994, Math. Methods Oper. Res..

[249]  Moshe Shaked,et al.  Stochastic orders and their applications , 1994 .

[250]  Jean B. Lasserre,et al.  Detecting optimal and non-optimal actions in average-cost Markov decision processes , 1994 .

[251]  J. Tsitsiklis A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[252]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[253]  Eugene A. Feinberg,et al.  Markov Decision Models with Weighted Discounted Criteria , 1994, Math. Oper. Res..

[254]  Matthew J. Sobel,et al.  Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[255]  D. J. White A mathematical programming approach to a problem in variance penalised Markov decision processes , 1994 .

[256]  Gideon Weiss,et al.  The Stochastic Optimality of SEPT in Parallel Machine Scheduling , 1994, Probability in the Engineering and Informational Sciences.

[257]  K. Wakuta Vector-valued Markov decision processes and the systems of linear inequalities , 1995 .

[258]  Eitan Altman,et al.  The Linear Program approach in multi-chain Markov Decision Processes revisited , 1995, Math. Methods Oper. Res..

[259]  K. Glazebrook,et al.  On transforming an index for generalised bandit problems , 1995, Journal of Applied Probability.

[260]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[261]  D. J. White A superharmonic approach to solving infinite horizon partially observable Markov decision problems , 1995, Math. Methods Oper. Res..

[262]  Dimitri P. Bertsekas,et al.  Generic rank-one corrections for value iteration in Markovian decision problems , 1995, Oper. Res. Lett..

[263]  Arie Hordijk,et al.  Markov Decision Chains , 1996 .

[264]  Kevin D. Glazebrook,et al.  Reflections on a New Approach to Gittins Indexation , 1996 .

[265]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[266]  Eitan Altman,et al.  On the value function in constrained control of Markov chains , 1996, Math. Methods Oper. Res..

[267]  Kazuyoshi Wakuta,et al.  A new class of policies in vector-valued Markov decision processes , 1996 .

[268]  Apostolos Burnetas,et al.  Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[269]  D. Bertsekas A New Value Iteration method for the Average Cost Dynamic Programming Problem , 1998 .

[270]  K. D. Glazebrook,et al.  On a new approach to the analysis of complex multi-armed bandits , 1998, Math. Methods Oper. Res..

[271]  L. Sennott Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .

[272]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[273]  E. Altman Constrained Markov Decision Processes , 1999 .

[274]  Michael K. Ng A note on policy algorithms for discounted Markov decision problems , 1999, Oper. Res. Lett..

[275]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[276]  Isaac Sonin,et al.  The Elimination algorithm for the problem of optimal stopping , 1999, Math. Methods Oper. Res..

[277]  Kazuyoshi Wakuta,et al.  A note on the structure of value spaces in vector-valued Markov decision processes , 1999, Math. Methods Oper. Res..

[278]  J. Stoer,et al.  Introduction to Numerical Analysis , 2002 .

[279]  Eric V. Denardo,et al.  Dynamic Programming: Models and Applications , 2003 .