论文信息 - Risk-Constrained Markov Decision Processes - 字舞流文

Risk-Constrained Markov Decision Processes

We propose a new constrained Markov decision process framework with risk-type constraints. The risk metric we use is Conditional Value-at-Risk (CVaR), which is gaining popularity in finance. It is a conditional expectation but the conditioning is defined in terms of the level of the tail probability. We propose an iterative offline algorithm to find the risk-contrained optimal control policy. A stochastic approximation-inspired ‘learning’ variant is also sketched.

Vivek S. Borkar | Rahul Jain | V. Borkar | R. Jain

[1] A. Charnes,et al. Cost Horizons and Certainty Equivalents: An Approach to Stochastic Programming of Heating Oil , 1958 .

[2] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[3] J. Hale,et al. Stability of Motion. , 1964 .

[4] C. Derman,et al. Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[5] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[6] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[7] F. Beutler,et al. Optimal policies for controlled markov chains with a constraint , 1985 .

[8] V. Borkar. A convex analytic approach to Markov decision processes , 1988 .

[9] P. Varaiya,et al. Stochastic Dynamic Optimization Approaches and Computation , 1988 .

[10] Keith W. Ross,et al. Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[11] A. Shwartz,et al. Stochastic approximations for finite-state Markov chains , 1990 .

[12] Keith W. Ross,et al. Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[13] Eugene A. Feinberg,et al. Constrained Semi-Markov decision processes with average rewards , 1994, Math. Methods Oper. Res..

[14] Eitan Altman,et al. The Linear Program approach in multi-chain Markov Decision Processes revisited , 1995, Math. Methods Oper. Res..

[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16] Roger J.-B. Wets. Challenges in stochastic programming , 1996, Math. Program..

[17] Eugene A. Feinberg,et al. Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[18] Moshe Haviv,et al. On constrained Markov decision processes , 1996, Oper. Res. Lett..

[19] C. Klüppelberg,et al. Modelling Extremal Events , 1997 .

[20] John R. Birge,et al. Introduction to Stochastic Programming , 1997 .

[21] V. Borkar. Stochastic approximation with two time scales , 1997 .

[22] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..

[23] J Figueira,et al. Stochastic Programming , 1998, J. Oper. Res. Soc..

[24] E. Altman. Constrained Markov Decision Processes , 1999 .

[25] G. Pflug. Some Remarks on the Value-at-Risk and the Conditional Value-at-Risk , 2000 .

[26] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[27] P. Krokhmal,et al. Portfolio optimization with conditional value-at-risk objective and constraints , 2001 .

[28] Paul R. Milgrom,et al. Envelope Theorems for Arbitrary Choice Sets , 2002 .

[29] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[30] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[31] A. Ruszczynski,et al. Portfolio optimization with stochastic dominance constraints , 2006 .

[32] Alexander Shapiro,et al. Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..

[33] A. Ruszczynski,et al. Optimization of Risk Measures , 2006 .

[34] David Heath,et al. Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..

[35] Darinka Dentcheva,et al. Stochastic Dynamic Optimization with Discounted Stochastic Dominance Constraints , 2008, SIAM J. Control. Optim..

[36] Hua He,et al. Optimal Dynamic Trading Strategies with Risk Limits , 2001, Oper. Res..

[37] Alexander Shapiro,et al. Stochastic programming approach to optimization under uncertainty , 2007, Math. Program..

[38] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[39] Csaba I. Fábián,et al. Algorithms for handling CVaR-constraints in dynamic stochastic programming models with applications to finance , 2008 .

[40] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .

[41] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[42] Pravin Varaiya,et al. Smart Operation of Smart Grid: Risk-Limiting Dispatch , 2011, Proceedings of the IEEE.

[43] Ram Rajagopal,et al. Risk-limiting dispatch for integrating renewable power , 2013 .

[44] András Prékopa,et al. ON PROBABILISTIC CONSTRAINED PROGRAMMING , 2015 .