Regret based Robust Solutions for Uncertain Markov Decision Processes

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

[1]  Patrick Jaillet,et al.  Loss bounds for uncertain transition probabilities in Markov decision processes , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[2]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[3]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[4]  Shie Mannor,et al.  Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.

[5]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[6]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[7]  Shie Mannor,et al.  Parametric regret in uncertain Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[8]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[11]  Craig Boutilier,et al.  Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.

[12]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[13]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[14]  Andrew Y. Ng,et al.  Solving Uncertain Markov Decision Processes , 2001 .

[15]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[16]  Craig Boutilier,et al.  Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies , 2010, AAAI.