Constrained Markov Decision Processes

INTRODUCTION Examples of Constrained Dynamic Control Problems On Solution Approaches for CMDPs with Expected Costs Other Types of CMDPs Cost Criteria and Assumptions The Convex Analytical Approach and Occupation Measures Linear Programming and Lagrangian Approach for CMDPs About the Methodology The Structure of the Book PART ONE: FINITE MDPS MARKOV DECISION PROCESSES The Model Cost Criteria and the Constrained Problem Some Notation The Dominance of Markov Policies THE DISCOUNTED COST Occupation Measure and the Primal LP Dynamic Programming and Dual LP: the Unconstrained Case Constrained Control: Lagrangian Approach The Dual LP Number of Randomizations THE EXPECTED AVERAGE COST Occupation Measure and the Primal LP Equivalent Linear Program The Dual Program Number of Randomizations FLOW AND SERVICE CONTROL IN A SINGLE-SERVER QUEUE The Model The Lagrangian The Original Constrained Problem Structure of Randomization and Implementation Issues On Coordination Between Controllers Open Questions PART TWO: INFINITE MDPS MDPS WITH INFINITE STATE AND ACTION SPACES The Model Cost Criteria Mixed Policies, and Topologic Structures The Dominance of Markov Policies Aggregation of States Extra Randomization in the Policies Equivalent Quasi-Markov Model and Quasi-Markov Policies THE TOTAL COST: CLASSIFICATION OF MDPS Transient and Absorbing MDPs MDPs With Uniform Lyapunov Functions Equivalence of MDP With Unbounded and bounded costs Properties of MDPs With Uniform Lyapunov Functions Properties for Fixed Initial Distribution Examples of Uniform Lyapunov Functions Contracting MDPs THE TOTAL COST: OCCUPATION MEASURES AND THE PRIMAL LP Occupation Measure Continuity of Occupation Measures More Properties of MDPs Characterization of Achievable Sets of Occupation Measure Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program THE TOTAL COST: DYNAMIC AND LINEAR PROGRAMMING Non-Constrained Control: Dynamic and Linear Programming Superharmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP State Truncation A Second LP Approach for Optimal Mixed Policies More on Unbound Costs THE DISCOUNTED COST The Equivalent Total Cost Model Occupation Measure and LP Non-negative Immediate Cost Weak Contracting Assumptions and Lyapunov Functions Example: Flow and Service Control THE EXPECTED AVERAGE COST Occupation Measures Completeness Properties of Stationary Policies Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program The Contracting Framework Other Conditions for the Uniform Integrability The Case of Uniform Lyapunov Conditions EXPECTED AVERAGE COST: DYNAMIC PROGRAMMING AND LP The Non-Constrained Case: Optimality Inequality Non-Constrained Control: Cost Bounded Below Dynamic Programming and Uniform Lyapunov Function Super-Harmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP A Second LP Approach for Optimal Mixed Policies PART THREE: ASYMPTOTIC METHODS AND APPROXIMATIONS SENSITIVITY ANALYSIS Introduction Approximation of the Values Approximation and Robustness of the Policies CONVERGENCE OF DISCOUNTED CONSTRAINED MDPS Convergence in the Discount Factor Convergence to the Expected Average Cost The Case of Uniform Lyapunov Function CONVERGENCE AS THE HORIZON TENDS TO INFINITY The Discounted Cost The Expected Average Cost: Stationary Policies The Expected Average Cost: General Policies STATE TRUNCATION AND APPROXIMATION The Approximating sets of States Scheme I: the Total Cost Scheme II: the Total Cost Scheme III: the Total Cost The Expected Average Cost Infinite MDPs: on the Number of Randomizations APPENDIX: CONVERGENCE OF PROBABILITY MEASURES REFERENCES LIST OF SYMBOLS AND NOTATION INDEX

[1]  M. Fréchet Convergence in probability , 1930 .

[2]  M. Kreĭn,et al.  On extreme points of regular convex sets , 1940 .

[3]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[4]  M. Sion On general minimax theorems , 1958 .

[5]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[6]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[7]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[8]  S. Friedman On Stochastic Approximations , 1963 .

[9]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[10]  Robert J . Aumann,et al.  28. Mixed and Behavior Strategies in Infinite Extensive Games , 1964 .

[11]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[12]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[13]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[14]  G. Dantzig,et al.  On the continuity of the minimum set of a continuous function , 1967 .

[15]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[16]  S. Ross,et al.  An Example in Denumerable Decision Processes , 1968 .

[17]  L. Fisher,et al.  On Recurrent Denumerable Decision Processes , 1968 .

[18]  J. Kemeny,et al.  Denumerable Markov chains , 1969 .

[19]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[20]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[21]  Peter Kolesar,et al.  A Markovian Model for Hospital Admission Scheduling , 1970 .

[22]  E. Denardo On Linear Programming in a Markov Decision Problem , 1970 .

[23]  H. Kushner,et al.  Mathematical programming and the control of Markov chains , 1971 .

[24]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[25]  J. Bather Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.

[26]  R. Tyrrell Rockafellar Conjugate Duality and Optimization , 1974 .

[27]  A. A. Yushkevich,et al.  On a Class of Strategies in General Markov Decision Models , 1974 .

[28]  A. Fiacco Convergence properties of local solutions of sequences of mathematical programming problems in general spaces , 1974 .

[29]  Manfred SchÄl,et al.  Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[30]  J Jaap Wessels,et al.  Markov games with unbounded rewards , 1976 .

[31]  Awi Federgruen,et al.  Geometric convergence of value-iteration in multichain markov renewal programming , 1977 .

[32]  R. Cowan An introduction to the theory of point processes , 1978 .

[33]  Wolf-Rüdiger Heilmann,et al.  Solving stochastic dynamic programming problems by linear programming — An annotated bibliography , 1978, Z. Oper. Research.

[34]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[35]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[36]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[37]  D. White Finite-state approximations for denumerable-state infinite-horizon discounted Markov decision processes , 1980 .

[38]  W. Whitt Representation and Approximation of Noncooperative Sequential Games , 1980 .

[39]  J. van der Wal On stationary strategies , 1981 .

[40]  Mischa Schwartz,et al.  Optimal fixed frame multiplexing in integrated line- and packet-switched communication networks , 1982, IEEE Trans. Inf. Theory.

[41]  Jan Telgen,et al.  Stochastic Dynamic Programming , 1982 .

[42]  Kamal Golabi,et al.  A Statewide Pavement Management System , 1982 .

[43]  Roger Hartley,et al.  Stochastic Dynamic Programming , 1982 .

[44]  E. Fainberg Non-Randomized Markov and Semi-Markov Strategies in Dynamic Programming , 1982 .

[45]  D. White Finite state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards , 1982 .

[46]  Aurel A. Lazar,et al.  Optimal flow control of a class of queueing networks in equilibrium , 1983 .

[47]  P. Kanniappan,et al.  Uniform convergence of convex optimization problems , 1983 .

[48]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[49]  E. Fainberg,et al.  Stationary and Markov policies in countable state dynamic programming , 1983 .

[50]  V. Borkar On Minimum Cost Per Unit Time Control of Markov Chains , 1984 .

[51]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[52]  Lyn C. Thomas,et al.  Finite state approximation algorithms for average cost denumerable state Markov decision processes , 1985 .

[53]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[54]  J. Filar,et al.  Gain/variability tradeoffs in undiscounted Markov decision processes , 1985, 1985 24th IEEE Conference on Decision and Control.

[55]  N. Krylov,et al.  Statistics and control of stochastic processes , 1985 .

[56]  Andrzej S. Nowak,et al.  Existence of equilibrium stationary strategies in discounted noncooperative stochastic games with uncountable state space , 1985 .

[57]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[58]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .

[59]  R. Cavazos-Cadena Finite-state approximations for denumerable state discounted markov decision processes , 1986 .

[60]  O. Hernández-Lerma Finite-state approximations for denumerable multidimensional state discounted Markov decision processes , 1986 .

[61]  R. Wets,et al.  Designing approximation schemes for stochastic optimization problems, in particular for stochastic programs with recourse , 1986 .

[62]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[63]  Keith W. Ross,et al.  Optimal priority assignment with hard constraint , 1986 .

[64]  A. A. Pervozvanskiĭ,et al.  Perturbation theory for mathematical programming problems , 1986 .

[65]  D. J. White Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes , 1987 .

[66]  H. Kawai A variance minimization problem for a Markov decision process , 1987 .

[67]  R. Rockafellar Conjugate Duality and Optimization , 1987 .

[68]  F. Vakil,et al.  Flow control protocols for integrated networks with partially observed voice traffic , 1987 .

[69]  Petr Mandl,et al.  On adaptive control of Markov processes , 1987, Kybernetika.

[70]  E. Fainberg Sufficient Classes of Strategies in Discrete Dynamic Programming I: Decomposition of Randomized Strategies and Embedded Models , 1987 .

[71]  Schäl Manfred Estimation and control in discounted stochastic dynamic programming , 1987 .

[72]  H. Wiese,et al.  J. Goodisman: “Electrochemistry: Theoretical Foundations”, John Wiley & Sons, Inc., New York, Chichester, Brisbane, Toronto, Singapore 1987. 374 Seiten, Preis: £ 50.45 , 1988 .

[73]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..

[74]  E. Altman,et al.  Markov optimization problems: state-action frequencies revisited , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[75]  Armand M. Makowski,et al.  A class of steering policies under a recurrence condition , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[76]  V. Borkar A convex analytic approach to Markov decision processes , 1988 .

[77]  E. A. Fajnberg Sufficient Classes of Strategies in Discrete Dynamic Programming. II: Locally Stationary Strategies , 1988 .

[78]  Keith W. Ross,et al.  Optimal scheduling of interactive and noninteractive traffic in telecommunication systems , 1988 .

[79]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[80]  Linn I. Sennott,et al.  Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs , 1989, Oper. Res..

[81]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[82]  A. Hordijk,et al.  Constrained admission control to a queueing system , 1989, Advances in Applied Probability.

[83]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[84]  K. Ross,et al.  Variability sensitive Markov decision processes , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[85]  Rolando Cavazos-Cadena,et al.  Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs , 1989, Kybernetika.

[86]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[87]  Adam Shwartz,et al.  Optimal priority assignment: a time sharing approach , 1989 .

[88]  Irwin E. Schochetman Pointwise versions of the maximum theorem with applications in optimization , 1990 .

[89]  A. Shwartz,et al.  Stochastic approximations for finite-state Markov chains , 1990 .

[90]  M. K. Ghosh,et al.  Controlled diffusions with constraints , 1990 .

[91]  T. E. S. Raghavan,et al.  Algorithms for stochastic games — A survey , 1991, ZOR Methods Model. Oper. Res..

[92]  E. Altman,et al.  Markov decision problems and state-action frequencies , 1991 .

[93]  Eugene A. Feinberg,et al.  Non-randomized strategies in stochastic decision processes , 1991, Ann. Oper. Res..

[94]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[95]  E. AltmanINRIA ASYMPTOTIC PROPERTIES OF CONSTRAINEDMARKOV DECISION PROCESSES , 1991 .

[96]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[97]  V. Borkar Topics in controlled Markov chains , 1991 .

[98]  A. Shwartz,et al.  Adaptive control of constrained Markov chains , 1991 .

[99]  Robert L. Smith,et al.  Convergence of selections with applications in optimization , 1991 .

[100]  Linn I. Sennott,et al.  Constrained Discounted Markov Decision Chains , 1991, Probability in the Engineering and Informational Sciences.

[101]  Aurel A. Lazar,et al.  Optimal Decentralized Flow Control of Markovian Queueing Networks with Multiple Controllers , 1991, Perform. Evaluation.

[102]  J. Filar,et al.  Some comments on a theorem of Hardy and Littlewood , 1992 .

[103]  P. Bernhard Information and strategies in dynamic games , 1992 .

[104]  O. Hernández-Lerma,et al.  Equivalence of Lyapunov stability criteria in a class of Markov decision processes , 1992 .

[105]  Keith W. Ross,et al.  Variability Sensitive Markov Decision Processes , 1992, Math. Oper. Res..

[106]  Jacqueline Morgan,et al.  Convergences of marginal functions with dependent constraints , 1992 .

[107]  A. Shwartz,et al.  Stochastic approximations and adaptive control of a discrete-time single-server network with random routing , 1992 .

[108]  Armand M. Makowski,et al.  A class of two-dimensional stochastic approximations and steering policies for Markov decision processes , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[109]  R. Cavazos-Cadena Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state , 1992 .

[110]  Rolando Cavazos-Cadena,et al.  Comparing recent assumptions for the existence of average optimal stationary policies , 1992, Oper. Res. Lett..

[111]  J. Lasserre Average optimal stationary policies and linear programming in countable space Markov decision processes , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[112]  Andreas D. Bovopoulos,et al.  The effect of delayed feedback information on network performance , 1992, Ann. Oper. Res..

[113]  Eitan Altman,et al.  Asymptotic properties of constrained Markov Decision Processes , 1993, ZOR Methods Model. Oper. Res..

[114]  Linn I. Sennott,et al.  Constrained Average Cost Markov Decision Chains , 1993, Probability in the Engineering and Informational Sciences.

[115]  Lucchetti Roberto,et al.  CONVERGENCE OF MINIMA OF INTEGRAL FUNCTIONALS, WITH APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC OPTIMIZATION , 1993 .

[116]  J. Aubin Optima and Equilibria: An Introduction to Nonlinear Analysis , 1993 .

[117]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[118]  E. Altman,et al.  Stability and singular perturbations in constrained Markov decision problems , 1993, IEEE Trans. Autom. Control..

[119]  Eitan Altman,et al.  Time-Sharing Policies for Controlled Markov Chains , 1993, Oper. Res..

[120]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[121]  Jean B. Lasserre,et al.  Linear programming formulation of MDPs in countable state space: The multichain case , 1994, Math. Methods Oper. Res..

[122]  Eitan Altman,et al.  Rate of Convergence of Empirical Measures and Costs in Controlled Markov Chains and Transient Optimality , 1994, Math. Oper. Res..

[123]  Eugene A. Feinberg,et al.  Constrained Semi-Markov decision processes with average rewards , 1994, Math. Methods Oper. Res..

[124]  J. Lasserre Average Optimal Stationary Policies and Linear Programming in Countable Space Markov Decision Processes , 1994 .

[125]  E. Altman,et al.  Approximations In Dynamic Zero-Sum Games , 1994 .

[126]  Nahum Shimkin,et al.  Stochastic Games with Average Cost Constraints , 1994 .

[127]  V. Borkar Ergodic Control of Markov Chains with Constraints---The General Case , 1994 .

[128]  Lodewijk C. M. Kallenberg,et al.  Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory , 1994, Math. Methods Oper. Res..

[129]  M. Reiman,et al.  Optimality of Randomized Trunk Reservation , 1994 .

[130]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[131]  Eitan Altman,et al.  Denumerable Constrained Markov Decision Processes and Finite Approximations , 1994, Math. Oper. Res..

[132]  A. B. Piunovskii Control of Random Sequences in Problems with Constraints , 1994 .

[133]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[134]  Matthew J. Sobel,et al.  Mean-Variance Tradeoffs in an Undiscounted MDP , 1994, Oper. Res..

[135]  Rommert Dekker,et al.  On the Relation Between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains , 1994, Math. Oper. Res..

[136]  D. J. White A mathematical programming approach to a problem in variance penalised Markov decision processes , 1994 .

[137]  『The Central Nervous System in AIDS』, J Artigas, G Grosse and F Niedobitek (Eds), Springer-Verlag, Berlin, Heidelberg, New York, London, Paris, Tokyo, Hong Kong, Barcelona, Budapest, 1993(らいぶらりい) , 1994 .

[138]  O. Hernández-Lerma,et al.  Discounted Cost Markov Decision Processes on Borel Spaces: The Linear Programming Formulation , 1994 .

[139]  E. Altman,et al.  A Hybrid (Differential-Stochastic) Zero-Sum Game with a Fast Stochastic Part , 1995 .

[140]  Eitan Altman,et al.  The Linear Program approach in multi-chain Markov Decision Processes revisited , 1995, Math. Methods Oper. Res..

[141]  Aurel A. Lazar,et al.  On the existence of equilibria in noncooperative optimal flow control , 1995, JACM.

[142]  Eugene A. Feinberg,et al.  Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[143]  E. Feinberg,et al.  Bicriterion Optimization of an M/G/1 Queue with A Removable Server , 1996 .

[144]  A MULTICRITERIA MODEL OF OPTIMAL CONTROL OF A STOCHASTIC LINEAR SYSTEM , 1996 .

[145]  Eugene A. Feinberg,et al.  Notes on equivalent stationary policies in Markov decision processes with total rewards , 1996, Math. Methods Oper. Res..

[146]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[147]  Moshe Haviv,et al.  On constrained Markov decision processes , 1996, Oper. Res. Lett..

[148]  Eitan Altman,et al.  Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP , 1996, Math. Methods Oper. Res..

[149]  A. Piunovskiy Optimal Control of Random Sequences in Problems with Constraints , 1997 .

[150]  Linn I. Sennott,et al.  The Computation of Average Optimal Policies in Denumerable State Markov Decision Chains , 1997, Advances in Applied Probability.

[151]  Linn I. Sennott,et al.  On computing average cost optimal policies with application to routing to parallel queues , 1997, Math. Methods Oper. Res..

[152]  A. Hordijk,et al.  Contraction Conditions for Average and α-Discount Optimality in Countable State Markov Games with Unbounded Rewards , 1997, Math. Oper. Res..

[153]  E. Altman,et al.  Approximations in Dynamic Zero-Sum Games II , 1997 .

[154]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[155]  Daryl J. Daley,et al.  An Introduction to the Theory of Point Processes , 2013 .

[156]  Eitan Altman,et al.  Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program , 1998, Math. Methods Oper. Res..

[157]  Eitan Altman,et al.  Constrained Markov Games: Nash Equilibria , 2000 .

[158]  Eitan Altman,et al.  Continuity of Optimal Values and Solutions for Control of Markov Chains with Constraints , 2000, SIAM J. Control. Optim..

[159]  Andrew G. Glen,et al.  APPL , 2001 .