Risk-Constrained Markov Decision Processes

We propose a new constrained Markov decision process framework with risk-type constraints. The risk metric we use is Conditional Value-at-Risk (CVaR), which is gaining popularity in finance. It is a conditional expectation but the conditioning is defined in terms of the level of the tail probability. We propose an iterative offline algorithm to find the risk-contrained optimal control policy. A stochastic approximation-inspired ‘learning’ variant is also sketched.

[1]  A. Charnes,et al.  Cost Horizons and Certainty Equivalents: An Approach to Stochastic Programming of Heating Oil , 1958 .

[2]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[3]  J. Hale,et al.  Stability of Motion. , 1964 .

[4]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[5]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[6]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[7]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[8]  V. Borkar A convex analytic approach to Markov decision processes , 1988 .

[9]  P. Varaiya,et al.  Stochastic Dynamic Optimization Approaches and Computation , 1988 .

[10]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[11]  A. Shwartz,et al.  Stochastic approximations for finite-state Markov chains , 1990 .

[12]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[13]  Eugene A. Feinberg,et al.  Constrained Semi-Markov decision processes with average rewards , 1994, Math. Methods Oper. Res..

[14]  Eitan Altman,et al.  The Linear Program approach in multi-chain Markov Decision Processes revisited , 1995, Math. Methods Oper. Res..

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16]  Roger J.-B. Wets Challenges in stochastic programming , 1996, Math. Program..

[17]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[18]  Moshe Haviv,et al.  On constrained Markov decision processes , 1996, Oper. Res. Lett..

[19]  C. Klüppelberg,et al.  Modelling Extremal Events , 1997 .

[20]  John R. Birge,et al.  Introduction to Stochastic Programming , 1997 .

[21]  V. Borkar Stochastic approximation with two time scales , 1997 .

[22]  Vivek S. Borkar,et al.  Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..

[23]  J Figueira,et al.  Stochastic Programming , 1998, J. Oper. Res. Soc..

[24]  E. Altman Constrained Markov Decision Processes , 1999 .

[25]  G. Pflug Some Remarks on the Value-at-Risk and the Conditional Value-at-Risk , 2000 .

[26]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[27]  P. Krokhmal,et al.  Portfolio optimization with conditional value-at-risk objective and constraints , 2001 .

[28]  Paul R. Milgrom,et al.  Envelope Theorems for Arbitrary Choice Sets , 2002 .

[29]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[30]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[31]  A. Ruszczynski,et al.  Portfolio optimization with stochastic dominance constraints , 2006 .

[32]  Alexander Shapiro,et al.  Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..

[33]  A. Ruszczynski,et al.  Optimization of Risk Measures , 2006 .

[34]  David Heath,et al.  Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..

[35]  Darinka Dentcheva,et al.  Stochastic Dynamic Optimization with Discounted Stochastic Dominance Constraints , 2008, SIAM J. Control. Optim..

[36]  Hua He,et al.  Optimal Dynamic Trading Strategies with Risk Limits , 2001, Oper. Res..

[37]  Alexander Shapiro,et al.  Stochastic programming approach to optimization under uncertainty , 2007, Math. Program..

[38]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[39]  Csaba I. Fábián,et al.  Algorithms for handling CVaR-constraints in dynamic stochastic programming models with applications to finance , 2008 .

[40]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[41]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[42]  Pravin Varaiya,et al.  Smart Operation of Smart Grid: Risk-Limiting Dispatch , 2011, Proceedings of the IEEE.

[43]  Ram Rajagopal,et al.  Risk-limiting dispatch for integrating renewable power , 2013 .

[44]  András Prékopa,et al.  ON PROBABILISTIC CONSTRAINED PROGRAMMING , 2015 .