Discrete-time controlled Markov processes with average cost criterion: a survey

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation.

[1]  Richard Bellman,et al.  On a Particular Non-Zero-Sum Game , 1949 .

[2]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[3]  K. Arrow,et al.  Optimal Inventory Policy. , 1951 .

[4]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[5]  Lionel Weiss,et al.  The Inventory Problem , 1953 .

[6]  J. Doob Stochastic processes , 1953 .

[7]  Samuel Karlin,et al.  The structure of dynamic programing models , 1955 .

[8]  Dean Gillette,et al.  9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .

[9]  Harvey M. Wagner On the Optimality of Pure Strategies , 1960 .

[10]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[11]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[12]  Alvin W Drake,et al.  Observation of a Markov process through a noisy channel , 1962 .

[13]  C. Derman On Sequential Decisions and Markov Chains , 1962 .

[14]  D. White Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[15]  A. N. Shiryaev On Markov Sufficient Statistics in Non-Additive Bayes Problems of Sequential Analysis , 1964 .

[16]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[17]  E. Dynkin Controlled Random Sequences , 1965 .

[18]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[19]  J. Neveu,et al.  Mathematical foundations of the calculus of probability , 1965 .

[20]  H. M. Taylor Markovian sequential replacement processes , 1965 .

[21]  D. Blackwell Discounted Dynamic Programming , 1965 .

[22]  C. Striebel Sufficient statistics in the optimum control of stochastic systems , 1965 .

[23]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[24]  M. Aoki Optimal control of partially observable Markovian systems , 1965 .

[25]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[26]  N. Krylov The Construction of an Optimal Strategy for a Finite Controlled Chain , 1965 .

[27]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[28]  C. Derman,et al.  A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISION PROCESSES. , 1966 .

[29]  C. Derman DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .

[30]  Anders Martin-Löf,et al.  Existence of a Stationary Control for a Markov Chain Maximizing the Average Reward , 1967, Oper. Res..

[31]  S. Ross NON-DISCOUNTED DENUMERABLE MARKOVIAN DECISION MODELS , 1968 .

[32]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state-information II. The convexity of the lossfunction , 1969 .

[33]  S. Ross Arbitrary State Markovian Decision Processes , 1968 .

[34]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[35]  S. Ross,et al.  An Example in Denumerable Decision Processes , 1968 .

[36]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[37]  Sequential decision processes with essential unobservables , 1969 .

[38]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[39]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[40]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[41]  T. Yoshikawa,et al.  Discrete-Time Markovian Decision Processes with Incomplete State Observation , 1970 .

[42]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[43]  A. S. Harding Markovian decision processes , 1970 .

[44]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[45]  S. Ross Quality Control under Markovian Deterioration , 1971 .

[46]  S. Orey Lecture Notes on Limit Theorems for Markov Chain Transition Probabilities , 1971 .

[47]  N. Furukawa Markovian Decision Processes with Compact Action Spaces , 1972 .

[48]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[49]  M. Schäl On continuous dynamic programming with discrete time-parameter , 1972 .

[50]  J. Bather Optimal decision procedures for finite Markov chains. Part III: General convex systems , 1973 .

[51]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[52]  Harold Joseph Highland,et al.  An inventory problem , 1973, SIML.

[53]  E. Denardo A Markov Decision Problem , 1973 .

[54]  S. Lippman Semi-Markov Decision Processes with Unbounded Rewards , 1973 .

[55]  L. Brown,et al.  Measurable Selections of Extrema , 1973 .

[56]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[57]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[58]  A. A. Yushkevich,et al.  On a Class of Strategies in General Markov Decision Models , 1974 .

[59]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[60]  James Flynn Averaging vs. Discounting in Dynamic Programming: a Counterexample , 1974 .

[61]  D. Rhenius Incomplete Information in Markovian Decision Models , 1974 .

[62]  M. Schäl On dynamic programming: Compactness of the space of policies , 1975 .

[63]  H. Tijms On dynamic programming with arbitrary state space, compact action space and the average return as criterion : (prepublication) , 1975 .

[64]  S. Lippman On Dynamic Programming with Unbounded Rewards , 1975 .

[65]  Charlotte Striebel,et al.  Optimal Control of Discrete Time Stochastic Systems , 1975 .

[66]  Keigo Yamada Duality theorem in Markovian decision problems , 1975 .

[67]  Manfred SchÄl,et al.  Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[68]  T. Parthasarathy,et al.  Optimal Plans for Dynamic Programming Problems , 1976, Math. Oper. Res..

[69]  James Flynn Conditions for the Equivalence of Optimality Criteria in Dynamic Programming , 1976 .

[70]  A. Yushkevich Reduction of a Controlled Markov Model with Incomplete Data to a Problem with Complete Information in the Case of Borel State and Control Space , 1976 .

[71]  D. Robinson Markov decision chains with unbounded costs and applications to the control of queues , 1976, Advances in Applied Probability.

[72]  R. Y. Chitashvili,et al.  A Controlled Finite Markov Chain with an Arbitrary Set of Decisions , 1976 .

[73]  G. Hübner,et al.  On the Fixed Points of the Optimal Reward Operator in Stochastic Dynamic Programming with Discount Factor Greater than One , 1976 .

[74]  R. Serfozo An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .

[75]  Robert C. Wang Computing optimal quality control policies — two actions , 1976 .

[76]  E. A. Fainberg On Controlled Finite State Markov Processes with Compact Control Sets , 1976 .

[77]  Jacob Wijngaard,et al.  Stationary Markovian Decision Problems and Perturbation Theory of Quasi-Compact Linear Operators , 1977, Math. Oper. Res..

[78]  Daniel H. Wagner Survey of Measurable Selection Theorems , 1977 .

[79]  Chelsea C. White,et al.  A Markov Quality Control Process Subject to Partial Observation , 1977 .

[80]  Awi Federgruen,et al.  RECURRENCE CONDITIONS IN DENUMERABLE STATE MARKOV DECISION PROCESSES , 1977 .

[81]  Loren K. Platzman,et al.  Finite memory estimation and control of finite probabilistic systems , 1977 .

[82]  D. Bertsekas,et al.  Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[83]  Robert C. Wang,et al.  OPTIMAL REPLACEMENT POLICY WITH UNOBSERVABLE STATES , 1977 .

[84]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[85]  J. Wijngaard Existence of average optimal strategies in Markovian decision problems with strictly unbounded costs , 1977 .

[86]  J. P. Georgin,et al.  Estimation et controle des chaines de Markov sur des espaces arbitraires , 1978 .

[87]  U. Rieder Measurable selection theorems for optimization problems , 1978 .

[88]  Dimitri P. Bertsekas,et al.  DYNAMIC PROGRAMMING IN BOREL SPACES , 1978 .

[89]  A. Federgruen,et al.  Denumerable state semi-markov decision processes with unbounded costs, average cost criterion : (preprint) , 1979 .

[90]  A. Federgruen,et al.  The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms , 1978 .

[91]  P. Schweitzer Contraction mappings underlying undiscounted Markov decision problems—II , 1978 .

[92]  A. Federgruen,et al.  The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms , 1978, Advances in Applied Probability.

[93]  C. White Optimal Inspection and Repair of a Production Process Subject to Deterioration , 1978 .

[94]  I. Gihman,et al.  Controlled Stochastic Processes , 1979 .

[95]  C. White Optimal control-limit strategies for a partially observed replacement problem† , 1979 .

[96]  C. White Bounds on optimal cost for a replacement problem with partial observations , 1979 .

[97]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[98]  A. Yushkevich On Reducing a Jump Controllable Markov Model to a Model with Discrete Time , 1980 .

[99]  D. Robinson Optimality conditions for a Markov decision chain with unbounded costs , 1980, Journal of Applied Probability.

[100]  C. White Monotone control laws for noisy, countable-state Markov chains , 1980 .

[101]  James Flynn On optimality criteria for dynamic programs with long finite horizons , 1980 .

[102]  A. Yushkevich,et al.  Controlled random sequences and Markov chains , 1982 .

[103]  Zvi Rosberg,et al.  Optimal control of service in tandem queues , 1982 .

[104]  Paul J. Schweitzer,et al.  Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards , 1983, Math. Oper. Res..

[105]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[106]  E. Fainberg Controlled Markov Processes with Arbitrary Numerical Criteria , 1983 .

[107]  P. Kumar Simultaneous identification and adaptive control of unknown systems over finite parameter sets , 1983 .

[108]  Vivek S. Borkar,et al.  Controlled Markov Chains and Stochastic Networks , 1983 .

[109]  V. Borkar On Minimum Cost Per Unit Time Control of Markov Chains , 1984 .

[110]  H. Mine,et al.  An Optimal Inspection and Replacement Policy under Incomplete State Information: Average Cost Criterion , 1984 .

[111]  Rolf van Dawen,et al.  Negative Dynamic Programming , 1984 .

[112]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[113]  J Jaap Wessels,et al.  Markov decision processes , 1985 .

[114]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[115]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[116]  Armand M. Makowski,et al.  An Optimal Adaptive Scheme for Two Competing Queues with Constraints , 1986 .

[117]  A. Makowski,et al.  Estimation and optimal control for constrained Markov chains , 1986, 1986 25th IEEE Conference on Decision and Control.

[118]  Hajime Kawai,et al.  An optimal inspection and replacement policy under incomplete state information , 1986 .

[119]  Jerzy A. Filar,et al.  Multiobjective Markov decision process with average reward criterion , 1986 .

[120]  Keith W. Ross,et al.  Optimal priority assignment with hard constraint , 1986 .

[121]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[122]  Masami Kurano,et al.  Markov Decision Processes with a Borel Measurable Cost Function - The Average Case , 1986, Math. Oper. Res..

[123]  Henk Tijms,et al.  Stochastic modelling and analysis: a computational approach , 1986 .

[124]  L. Sennott A new condition for the existence of optimal stationary policies in average cost Markov decision processes , 1986 .

[125]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case , 1987, Math. Oper. Res..

[126]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[127]  R. Weber,et al.  Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.

[128]  Arie Leizarowitz Infinite horizon optimization for finite state Markov chain , 1987, 26th IEEE Conference on Decision and Control.

[129]  William S. Lovejoy,et al.  Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[130]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[131]  Charles H. Fine A Quality Control Model with Learning Effects , 1988, Oper. Res..

[132]  Vivek S. Borkar,et al.  Control of Markov Chains with Long-Run Average Cost Criterion , 1988 .

[133]  W. Hopp,et al.  Multiaction maintenance under Markovian deterioration and incomplete state information , 1988 .

[134]  V. Borkar A convex analytic approach to Markov decision processes , 1988 .

[135]  R. Cavazos-Cadena Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains , 1988 .

[136]  Linn I. Sennott,et al.  Average Cost Semi-Markov Decision Processes and the Control of Queueing Systems , 1989, Probability in the Engineering and Informational Sciences.

[137]  O. Hernondex-lerma,et al.  Adaptive Markov Control Processes , 1989 .

[138]  Bernard F. Lamond,et al.  Generalized inverses in discrete time Markov decision process , 1989 .

[139]  R. Cavazos-Cadena Necessary conditions for the optimality equation in average-reward Markov decision processes , 1989 .

[140]  Linn I. Sennott,et al.  Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs , 1989, Oper. Res..

[141]  A. Arapostathis,et al.  On partially observable Markov decision processes with an average cost criterion , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[142]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[143]  M. Kurano The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin , 1989 .

[144]  V. Borkar Control of Markov chains with long-run average cost criterion: the dynamic programming equations , 1989 .

[145]  Mukul Majumdar,et al.  CONTROLLED SEMI-MARKOV MODELS UNDER LONG-RUN AVERAGE REWARDS , 1989 .

[146]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[147]  Shaler Stidham,et al.  Monotonic and Insensitive Optimal Policies for Control of Queues with Undiscounted Costs , 1989, Oper. Res..

[148]  M. K. Ghosh Markov decision processes with multiple costs , 1990 .

[149]  Armand M. Makowski,et al.  Comparing Policies in Markov Decision Processes: Mandl's Lemma Revisited , 1990, Math. Oper. Res..

[150]  Onésimo Hernández Lerma Average optimality in dynamic programming on borel spaces - unbounded costs and controls , 1990 .

[151]  V. Borkar Controlled Markov chains with constraints , 1990 .

[152]  Steven I. Marcus,et al.  Ergodic control of Markov chains , 1990, 29th IEEE Conference on Decision and Control.

[153]  O. Hernández-Lerma,et al.  Average cost optimal policies for Markov control processes with Borel state space and unbounded costs , 1990 .

[154]  M. Puterman,et al.  An improved algorithm for solving communicating average reward Markov decision processes , 1991 .

[155]  A. Arapostathis,et al.  Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes , 1991 .

[156]  Masami Kurano,et al.  Average cost Markov decision processes under the hypothesis of Doeblin , 1991, Ann. Oper. Res..

[157]  E. Altman,et al.  Markov decision problems and state-action frequencies , 1991 .

[158]  O. Hernández-Lerma,et al.  Average cost Markov Decision Processes: Optimality conditions☆ , 1991 .

[159]  R. Cavazos-Cadena Recent results on conditions for the existence of average optimal stationary policies , 1991 .

[160]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[161]  Ari Arapostathis,et al.  On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[162]  R. Cavazos-Cadena A counterexample on the optimality equation in Markov decision chains with the average cost criterion , 1991 .

[163]  O. Hernández-Lerma,et al.  Recurrence conditions for Markov decision processes with Borel state space: A survey , 1991 .

[164]  Jerzy A. Filar,et al.  A Weighted Markov Decision Process , 1992, Oper. Res..

[165]  J. Filar,et al.  Some comments on a theorem of Hardy and Littlewood , 1992 .

[166]  Onésimo Hernández-Lerma Average Optimality of Markov Decision Processes with Unbounded Costs , 1992 .

[167]  Vivek S. Borkar,et al.  Ergodic and adaptive control of nearest-neighbor motions , 1991, Math. Control. Signals Syst..

[168]  Linn I. Sennott,et al.  Optimal Stationary Policies in General State Space Markov Decision Chains with Finite Action Sets , 1992, Math. Oper. Res..

[169]  Rolando Cavazos-Cadena,et al.  Comparing recent assumptions for the existence of average optimal stationary policies , 1992, Oper. Res. Lett..

[170]  Steven I. Marcus,et al.  On strong average optimality of markov decision processes with unbounded costs , 1992, Oper. Res. Lett..

[171]  L. Sennott The Average Cost Optimality Equation and Critical Number Policies , 1993 .

[172]  A. Shwartz,et al.  ON THE POISSON EQUATION FOR MARKOV CHAINS : EXISTENCE OF SOLUTIONS AND PARAMETER DEPENDENCEBY PROBABILISTIC , 1994 .

[173]  V. Borkar Ergodic Control of Markov Chains with Constraints---The General Case , 1994 .

[174]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[175]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[176]  O. Gaans Probability measures on metric spaces , 2022 .