论文信息 - Constrained Markov Decision Processes

Constrained Markov Decision Processes

INTRODUCTION Examples of Constrained Dynamic Control Problems On Solution Approaches for CMDPs with Expected Costs Other Types of CMDPs Cost Criteria and Assumptions The Convex Analytical Approach and Occupation Measures Linear Programming and Lagrangian Approach for CMDPs About the Methodology The Structure of the Book PART ONE: FINITE MDPS MARKOV DECISION PROCESSES The Model Cost Criteria and the Constrained Problem Some Notation The Dominance of Markov Policies THE DISCOUNTED COST Occupation Measure and the Primal LP Dynamic Programming and Dual LP: the Unconstrained Case Constrained Control: Lagrangian Approach The Dual LP Number of Randomizations THE EXPECTED AVERAGE COST Occupation Measure and the Primal LP Equivalent Linear Program The Dual Program Number of Randomizations FLOW AND SERVICE CONTROL IN A SINGLE-SERVER QUEUE The Model The Lagrangian The Original Constrained Problem Structure of Randomization and Implementation Issues On Coordination Between Controllers Open Questions PART TWO: INFINITE MDPS MDPS WITH INFINITE STATE AND ACTION SPACES The Model Cost Criteria Mixed Policies, and Topologic Structures The Dominance of Markov Policies Aggregation of States Extra Randomization in the Policies Equivalent Quasi-Markov Model and Quasi-Markov Policies THE TOTAL COST: CLASSIFICATION OF MDPS Transient and Absorbing MDPs MDPs With Uniform Lyapunov Functions Equivalence of MDP With Unbounded and bounded costs Properties of MDPs With Uniform Lyapunov Functions Properties for Fixed Initial Distribution Examples of Uniform Lyapunov Functions Contracting MDPs THE TOTAL COST: OCCUPATION MEASURES AND THE PRIMAL LP Occupation Measure Continuity of Occupation Measures More Properties of MDPs Characterization of Achievable Sets of Occupation Measure Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program THE TOTAL COST: DYNAMIC AND LINEAR PROGRAMMING Non-Constrained Control: Dynamic and Linear Programming Superharmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP State Truncation A Second LP Approach for Optimal Mixed Policies More on Unbound Costs THE DISCOUNTED COST The Equivalent Total Cost Model Occupation Measure and LP Non-negative Immediate Cost Weak Contracting Assumptions and Lyapunov Functions Example: Flow and Service Control THE EXPECTED AVERAGE COST Occupation Measures Completeness Properties of Stationary Policies Relation Between Cost and Occupation Measure Dominating Classes of Policies Equivalent Linear Program The Dual Program The Contracting Framework Other Conditions for the Uniform Integrability The Case of Uniform Lyapunov Conditions EXPECTED AVERAGE COST: DYNAMIC PROGRAMMING AND LP The Non-Constrained Case: Optimality Inequality Non-Constrained Control: Cost Bounded Below Dynamic Programming and Uniform Lyapunov Function Super-Harmonic Functions and Linear Programming Set of Achievable Costs Constrained Control: Lagrangian Approach The Dual LP A Second LP Approach for Optimal Mixed Policies PART THREE: ASYMPTOTIC METHODS AND APPROXIMATIONS SENSITIVITY ANALYSIS Introduction Approximation of the Values Approximation and Robustness of the Policies CONVERGENCE OF DISCOUNTED CONSTRAINED MDPS Convergence in the Discount Factor Convergence to the Expected Average Cost The Case of Uniform Lyapunov Function CONVERGENCE AS THE HORIZON TENDS TO INFINITY The Discounted Cost The Expected Average Cost: Stationary Policies The Expected Average Cost: General Policies STATE TRUNCATION AND APPROXIMATION The Approximating sets of States Scheme I: the Total Cost Scheme II: the Total Cost Scheme III: the Total Cost The Expected Average Cost Infinite MDPs: on the Number of Randomizations APPENDIX: CONVERGENCE OF PROBABILITY MEASURES REFERENCES LIST OF SYMBOLS AND NOTATION INDEX

E. Altman

[1] M. Fréchet. Convergence in probability , 1930 .

[2] M. Kreĭn,et al. On extreme points of regular convex sets , 1940 .

[3] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[4] M. Sion. On general minimax theorems , 1958 .

[5] Kai Lai Chung,et al. Markov Chains with Stationary Transition Probabilities , 1961 .

[6] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[7] Kai Lai Chung,et al. Markov Chains with Stationary Transition Probabilities , 1961 .

[8] S. Friedman. On Stochastic Approximations , 1963 .

[9] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[10] Robert J . Aumann,et al. 28. Mixed and Behavior Strategies in Infinite Extensive Games , 1964 .

[11] C. Derman,et al. Some Remarks on Finite Horizon Markovian Decision Models , 1965 .