Markov decision processes under ambiguity

We consider statistical Markov Decision Processes where the decision maker is risk averse against model ambiguity. The latter is given by an unknown parameter which influences the transition law and the cost functions. Risk aversion is either measured by the entropic risk measure or by the Average Value at Risk. We show how to solve these kind of problems using a general minimax theorem. Under some continuity and compactness assumptions we prove the existence of an optimal (deterministic) policy and discuss its computation. We illustrate our results using an example from statistical decision theory.

[1]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[2]  M. Schäl On dynamic programming: Compactness of the space of policies , 1975 .

[3]  H. Föllmer,et al.  Stochastic Finance: An Introduction in Discrete Time , 2002 .

[4]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[5]  Nicole Bäuerle,et al.  Partially Observable Risk-Sensitive Markov Decision Processes , 2015, Math. Oper. Res..

[6]  M. Sion On general minimax theorems , 1958 .

[7]  M. Degroot Optimal Statistical Decisions , 1970 .

[8]  M. Schal On Dynamic Programming and Statistical Decision Theory , 1979 .

[9]  Takayuki Osogami,et al.  Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[10]  A. Rustichini,et al.  Ambiguity Aversion, Robustness, and the Variational Representation of Preferences , 2006 .

[11]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[12]  M. Marinacci,et al.  A Smooth Model of Decision Making Under Ambiguity , 2003 .

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  Tomasz R. Bielecki,et al.  Economic Properties of the Risk Sensitive Criterion for Portfolio Management , 2003 .

[15]  Lukasz Stettner,et al.  Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite Horizon , 1999, SIAM J. Control. Optim..

[16]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[17]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[18]  Soroush Saghafian,et al.  Robust Partially Observable Markov Decision Processes , 2018 .

[19]  T. Sargent,et al.  Robust Control and Model Uncertainty , 2001 .

[20]  Takayuki Osogami,et al.  Robust partially observable Markov decision process , 2015, ICML.

[21]  F. Rinaldi,et al.  Ambiguity in asset pricing and portfolio choice: a review of the literature , 2010 .

[22]  I. Gilboa,et al.  Maxmin Expected Utility with Non-Unique Prior , 1989 .

[23]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[24]  Nicole Bäuerle,et al.  Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..