Risk-Averse Decision Making Under Uncertainty

A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost/reward is minimized/maximized. However, optimality in the total expected cost sense is only reasonable if system’s behavior in the large number of runs is of interest, which has limited the use of such policies in practical missioncritical scenarios, wherein large deviations from the expected behavior may lead to mission failure. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures, which we refer to as the constrained risk-averse problem. Our contributions are fourfold: (i) For MDPs, we reformulate the problem into a inf-sup problem via the Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition mapping, we propose an optimization-based method to synthesize Markovian policies; (ii) For MDPs, we demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints; (iii) For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition mapping, an infinite-dimensional optimization can be used to design Markovian belief-based policies; (iv) For POMDPs with stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finitedimensional) DCP and can be solved by the DCCP framework. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs. We demonstrate the efficacy of the proposed method with numerical experiments involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) risk measures.

[1]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[2]  R. Horst,et al.  DC Programming: Overview , 1999 .

[3]  Aaron D. Ames,et al.  Constrained Risk-Averse Markov Decision Processes , 2020, AAAI.

[4]  Marco Pavone,et al.  A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms , 2019, IEEE Transactions on Automatic Control.

[5]  Joel W. Burdick,et al.  Finite state control of POMDPs with LTL specifications , 2014, 2014 American Control Conference.

[6]  Stephen P. Boyd,et al.  Variations and extension of the convex–concave procedure , 2016 .

[7]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[8]  Insoon Yang,et al.  Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization , 2019, IEEE Robotics and Automation Letters.

[9]  Aaron D. Ames,et al.  Risk-Averse Planning Under Uncertainty , 2019, 2020 American Control Conference (ACC).

[10]  Amir Ahmadi-Javid,et al.  Entropic Value-at-Risk: A New Coherent Risk Measure , 2012, J. Optim. Theory Appl..

[11]  E. Altman Constrained Markov Decision Processes , 1999 .

[12]  C.C. White,et al.  Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.

[13]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[14]  A. Ruszczynski,et al.  Process-based risk measures and risk-averse control of discrete-time systems , 2014, Math. Program..

[15]  Brian C. Williams,et al.  Non-Gaussian Chance-Constrained Trajectory Planning for Autonomous Vehicles Under Agent Uncertainty , 2020, IEEE Robotics and Automation Letters.

[16]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[17]  Jonathan Theodor Ott,et al.  A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes , 2010 .

[18]  Alexandre M. Bayen,et al.  Inverse covariance estimation from data with missing values using the Concave-Convex Procedure , 2014, 53rd IEEE Conference on Decision and Control.

[19]  Prashanth L.A,et al.  Policy Gradients for CVaR-Constrained MDPs , 2014, 1405.2690.

[20]  Marco Pavone,et al.  Risk aversion in finite Markov Decision Processes using total cost criteria and average value at risk , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  D. Vose Risk Analysis: A Quantitative Guide , 2000 .

[22]  Shie Mannor,et al.  Sequential Decision Making With Coherent Risk , 2017, IEEE Transactions on Automatic Control.

[23]  Georg Ch. Pflug,et al.  Time-Consistent Decisions and Temporal Decomposition of Coherent Risk Functionals , 2016, Math. Oper. Res..

[24]  Vikram Krishnamurthy,et al.  Sequential Detection of Market Shocks With Risk-Averse CVaR Social Sensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[25]  Takayuki Osogami,et al.  Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[26]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[27]  Mohamadreza Ahmadi,et al.  Time-Optimal Navigation in Uncertain Environments with High-Level Specifications , 2021, ArXiv.

[28]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[29]  P. Pardalos,et al.  Minimax and applications , 1995 .

[30]  Marco Pavone,et al.  How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics , 2017, ISRR.

[31]  Nils Jansen,et al.  Control Theory Meets POMDPs: A Hybrid Systems Approach , 2019, IEEE Transactions on Automatic Control.

[32]  Alberto Bemporad,et al.  Risk-averse model predictive control , 2017, Autom..

[33]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[34]  Shie Mannor,et al.  Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..

[35]  Joel W. Burdick,et al.  Risk-Sensitive Motion Planning using Entropic Value-at-Risk , 2020, 2021 European Control Conference (ECC).

[36]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[37]  Stephen P. Boyd,et al.  Disciplined convex-concave programming , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[38]  Shie Mannor,et al.  Policy Gradient for Coherent Risk Measures , 2015, NIPS.

[39]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[40]  Andrzej Ruszczynski,et al.  Risk measurement and risk-averse control of partially observable discrete-time Markov systems , 2018, Math. Methods Oper. Res..

[41]  Amir Ahmadi-Javid,et al.  Portfolio optimization with entropic value-at-risk , 2019, Eur. J. Oper. Res..

[42]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[43]  Anne Condon,et al.  On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..

[44]  Amir Ahmadi-Javid,et al.  An analytical study of norms and Banach spaces induced by the entropic value-at-risk , 2017 .

[45]  Joel W. Burdick,et al.  Risk-Averse Stochastic Shortest Path Planning , 2021, ArXiv.

[46]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[47]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[48]  Joel Burdick,et al.  STEP: Stochastic Traversability Evaluation and Planning for Safe Off-road Navigation , 2021, ArXiv.

[49]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..