Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. (If we have some reference here, it would be good.) Our framework of QMDP provides analytical results characterizing the optimal QMDP solution and presents the algorithm for solving the QMDP. We provide analytical results characterizing the optimal QMDP solution and present the algorithms for solving the QMDP. We illustrate the model with two experiments: a grid game and a HIV optimal treatment experiment.

[1]  L. Fraenkel,et al.  Risk-attitude and patient treatment preferences , 2003, Lupus.

[2]  Christel Baier,et al.  Computing Quantiles in Markov Reward Models , 2013, FoSSaCS.

[3]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4]  Alexey Piunovskiy,et al.  Dynamic programming in constrained Markov decision processes , 2006 .

[5]  Georg Ch. Pflug,et al.  Time-Consistent Decisions and Temporal Decomposition of Coherent Risk Functionals , 2016, Math. Oper. Res..

[6]  P. Austin,et al.  The use of quantile regression in health care research: a case study examining gender differences in the timeliness of thrombolytic therapy , 2005, Statistics in medicine.

[7]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[8]  Huan Xu,et al.  Dynamic programming for risk-aware sequential optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[9]  Douglas K Owens,et al.  Balancing immunological benefits and cardiovascular risks of antiretroviral therapy: when is immediate treatment optimal? , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[10]  Marek Petrik,et al.  Tight Approximations of Dynamic Risk Measures , 2011, Math. Oper. Res..

[11]  Milton C Weinstein,et al.  Principles of good practice for decision analytic modeling in health-care evaluation: report of the ISPOR Task Force on Good Research Practices--Modeling Studies. , 2003, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[12]  A. Vries Value at Risk , 2019, Derivatives.

[13]  Yan Xu,et al.  Optimizing Quantiles in Preference-Based Markov Decision Processes , 2016, AAAI.

[14]  Brian T. Denton,et al.  Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients , 2014, Eur. J. Oper. Res..

[15]  Marco Pavone,et al.  Risk aversion in finite Markov Decision Processes using total cost criteria and average value at risk , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Alexander Shapiro,et al.  Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..

[17]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[18]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[19]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[20]  D. Duffie,et al.  An Overview of Value at Risk , 1997 .

[21]  Jeremy Berkowitz,et al.  How Accurate are Value-at-Risk Models at Commercial Banks , 2001 .

[22]  J. Michael Steele,et al.  Markov Decision Problems Where Means Bound Variances , 2014, Oper. Res..

[23]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[24]  Jonathan AC Sterne,et al.  Prognosis of HIV-1-infected patients starting highly active antiretroviral therapy: a collaborative analysis of prospective studies , 2002, The Lancet.

[25]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[26]  A. Beyerlein Quantile regression-opportunities and challenges from a user's perspective. , 2014, American journal of epidemiology.

[27]  Nicole Bäuerle,et al.  Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[28]  Andrew J. Schaefer,et al.  The Optimal Time to Initiate HIV Therapy Under Ordered Health States , 2008, Oper. Res..

[29]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[30]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[31]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[32]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[33]  Alexander Shapiro,et al.  Risk neutral and risk averse Stochastic Dual Dynamic Programming method , 2013, Eur. J. Oper. Res..

[34]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[35]  John N. Tsitsiklis,et al.  Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.

[36]  Jan M. Hoem,et al.  Life Table , 2011, International Encyclopedia of Statistical Science.

[37]  M. Freiberg,et al.  HIV and Cardiovascular Disease: We Need a Mechanism, and We Need a Plan , 2016, Journal of the American Heart Association.

[38]  Bart Selman,et al.  Probabilistic planning with non-linear utility functions and worst-case guarantees , 2012, AAMAS.

[39]  Warren B. Powell,et al.  Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures , 2015, Math. Oper. Res..

[40]  Stanislav Uryasev,et al.  Conditional Value-at-Risk for General Loss Distributions , 2002 .

[41]  E. Altman Constrained Markov Decision Processes , 1999 .

[42]  Linn I. Sennott,et al.  Average Cost Semi-Markov Decision Processes and the Control of Queueing Systems , 1989, Probability in the Engineering and Informational Sciences.

[43]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[44]  Patrick Cheridito,et al.  Time-Inconsistency of VaR and Time-Consistent Alternatives , 2007 .

[45]  T. Bärnighausen,et al.  High Coverage of ART Associated with Decline in Risk of HIV Acquisition in Rural KwaZulu-Natal, South Africa , 2013, Science.

[46]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[47]  Roger Detels,et al.  Plasma Viral Load and CD4+ Lymphocytes as Prognostic Markers of HIV-1 Infection , 1997, Annals of Internal Medicine.

[48]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .