Computational Methods for Risk-Averse Undiscounted Transient Markov Models

The total cost problem for discrete-time controlled transient Markov models is considered. The objective functional is a Markov dynamic risk measure of the total cost. Two solution methods, value and policy iteration, are proposed, and their convergence is analyzed. In the policy iteration method, we propose two algorithms for policy evaluation: the nonsmooth Newton method and convex programming, and we prove their convergence. The results are illustrated on a credit limit control problem.

[1]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[2]  A. Ruszczynski,et al.  Optimization of Risk Measures , 2006 .

[3]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[4]  Eugene A. Feinberg,et al.  Handbook of Markov Decision Processes , 2002 .

[5]  Stanley R. Pliska ON THE TRANSIENT CASE FOR MARKOV DECISION CHAINS WITH GENERAL STATE SPACES , 1978 .

[6]  D. Klatte Nonsmooth equations in optimization , 2002 .

[7]  S. C. Jaquette Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .

[8]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[9]  S. Marcus,et al.  Existence of Risk-Sensitive Optimal Stationary Policies for Controlled Markov Processes , 1999 .

[10]  Frank Riedel,et al.  Dynamic Coherent Risk Measures , 2003 .

[11]  Rolando Cavazos-Cadena,et al.  Controlled Markov chains with risk-sensitive criteria: Average cost, optimality equations, and optimal solutions , 1999, Math. Methods Oper. Res..

[12]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[13]  Alexander Shapiro,et al.  Conditional Risk Mappings , 2005, Math. Oper. Res..

[14]  Congbin Wu,et al.  Minimizing risk models in Markov decision processes with policies depending on target values , 1999 .

[15]  Alexander Shapiro,et al.  Optimization of Convex Risk Functions , 2006, Math. Oper. Res..

[16]  Jerzy A. Filar,et al.  Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[17]  Özlem Çavus,et al.  Risk-Averse Control of Undiscounted Transient Markov Models , 2012, SIAM J. Control. Optim..

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Steven I. Marcus,et al.  Mixed risk-neutral/minimax control of discrete-time, finite-state Markov decision processes , 2000, IEEE Trans. Autom. Control..

[20]  Alexander Shapiro,et al.  Lectures on Stochastic Programming - Modeling and Theory, Second Edition , 2014, MOS-SIAM Series on Optimization.

[21]  Daniel Nyrén,et al.  Mean-Variance Optimization , 2005 .

[22]  W. Fleming,et al.  Risk sensitive control of finite state machines on an infinite horizon. I , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[23]  Steven I. Marcus,et al.  Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[24]  D. White Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .

[25]  Lukasz Stettner,et al.  Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite Horizon , 1999, SIAM J. Control. Optim..

[26]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[27]  Peter J. Fleming,et al.  An optimization toolbox for MATLAB , 1991 .

[28]  Steven D. Levitt,et al.  On Modeling Risk in Markov Decision Processes , 2001 .

[29]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[30]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[31]  Y. Nie,et al.  Shortest path problem considering on-time arrival probability , 2009 .

[32]  Uriel G. Rothblum,et al.  Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..

[33]  Wlodzimierz Ogryczak,et al.  Dual Stochastic Dominance and Related Mean-Risk Models , 2002, SIAM J. Optim..

[34]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[35]  S. Marcus,et al.  Risk sensitive control of Markov processes in countable state space , 1996 .

[36]  F. Delbaen,et al.  Dynamic Monetary Risk Measures for Bounded Discrete-Time Processes , 2004, math/0410453.

[37]  Jerzy A. Filar,et al.  Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..

[38]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[39]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[40]  Wlodzimierz Ogryczak,et al.  From stochastic dominance to mean-risk models: Semideviations as risk measures , 1999, Eur. J. Oper. Res..

[41]  Zdzisław Denkowski,et al.  Set-Valued Analysis , 2021 .

[42]  Yoshio Ohtsubo,et al.  Optimal threshold probability in undiscounted Markov decision processes with a target set , 2004, Appl. Math. Comput..

[43]  Jonathan Eckstein,et al.  YASAI: Yet Another Add-in for Teaching Elementary Monte Carlo Simulation in Excel , 2002 .

[44]  G. Pflug,et al.  Modeling, Measuring and Managing Risk , 2008 .

[45]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[46]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[47]  Lyn C. Thomas,et al.  Modelling the profitability of credit cards by Markov decision processes , 2011, Eur. J. Oper. Res..

[48]  Wlodzimierz Ogryczak,et al.  On consistency of stochastic dominance and mean–semideviation models , 2001, Math. Program..

[49]  Daniel Hernández-Hernández,et al.  Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management , 1999, Math. Methods Oper. Res..

[50]  B. Kummer NEWTON's METHOD FOR NON-DIFFERENTIABLE FUNCTIONS , 1988, Advances in Mathematical Optimization.

[51]  W. Fleming,et al.  Risk-Sensitive Control of Finite State Machines on an Infinite Horizon I , 1997 .

[52]  David Heath,et al.  Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..

[53]  Diethard Klatte,et al.  Nonsmooth Equations in Optimization: "Regularity, Calculus, Methods And Applications" , 2006 .

[54]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[55]  J. Filar,et al.  Gain/variability tradeoffs in undiscounted Markov decision processes , 1985, 1985 24th IEEE Conference on Decision and Control.

[56]  Stephen D. Patek,et al.  On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..

[57]  E. Fernández-Gaucherand,et al.  Risk-sensitive optimal control of hidden Markov models: structural results , 1997, IEEE Trans. Autom. Control..

[58]  Yoshio Ohtsubo Minimizing risk models in stochastic shortest path problems , 2003, Math. Methods Oper. Res..

[59]  S. C. Jaquette A Utility Criterion for Markov Decision Processes , 1976 .