Q-Learning for Distributionally Robust Markov Decision Processes

In this paper, we consider distributionally robust Markov Decision Processes with Borel state and action spaces and infinite time horizon. The problem is formulated as a Stackelberg game where nature as a second player chooses the least favorable disturbance density in each scenario. Under suitable assumptions, we prove that the value function is the unique fixed point of an operator and that minimizers respectively, maximizers lead to optimal policies for the decision maker and nature. Based on this result, we introduce a Q-learning approach to solve the problem via simulation-based techniques. We prove the convergence of the Q-learning algorithm and study its performance using a distributionally robust irrigation problem.

[1]  P. Embrechts,et al.  Quantitative Risk Management: Concepts, Techniques, and Tools , 2005 .

[2]  A. Rustichini,et al.  Ambiguity Aversion, Robustness, and the Variational Representation of Preferences , 2006 .

[3]  Ludger Rüschendorf,et al.  Mathematical Risk Analysis: Dependence, Risk Bounds, Optimal Allocations and Portfolios , 2013 .

[4]  K. Unami,et al.  Stochastic modelling and control of rainwater harvesting systems for irrigation during dry spells , 2015 .

[5]  U. Rieder,et al.  Markov Decision Processes with Applications to Finance , 2011 .

[6]  F. Rinaldi,et al.  Ambiguity in asset pricing and portfolio choice: a review of the literature , 2013 .

[7]  Alexander Harald Glauner Robust and Risk-Sensitive Markov Decision Processes with Applications to Dynamic Optimal Reinsurance , 2020 .

[8]  A. Nowak,et al.  Robust Markov control processes , 2014 .

[9]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[10]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[11]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[12]  Onésimo Hernández-Lerma,et al.  Minimax Control of Discrete-Time Stochastic Systems , 2002, SIAM J. Control. Optim..

[13]  R. Bellman Dynamic programming. , 1957, Science.

[14]  Alois Pichler,et al.  Premiums and reserves, adjusted by distortions , 2013 .

[15]  L. Rüschendorf Mathematical Risk Analysis , 2013 .

[16]  F. Rinaldi,et al.  Ambiguity in asset pricing and portfolio choice: a review of the literature , 2010 .

[17]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[18]  Nicole Bauerle,et al.  Distributionally Robust Markov Decision Processes and Their Connection to Risk Measures , 2020, Math. Oper. Res..

[19]  Shie Mannor,et al.  Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..

[20]  K. Unami,et al.  Stochastic control of a micro-dam irrigation scheme for dry season farming , 2012, Stochastic Environmental Research and Risk Assessment.