Solving Uncertain MDPs with Objectives that Are Separable over Instantiations of Model Uncertainty

Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specified. Existing research has primarily focussed on computing infinite horizon stationary policies when optimizing robustness, regret and percentile based objectives. We focus specifically on finite horizon problems with a special emphasis on objectives that are separable over individual instantiations of model uncertainty (i.e., objectives that can be expressed as a sum over instantiations of model uncertainty): (a) First, we identify two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM). (b) Second, we provide optimization based solutions to compute policies for uncertain MDPs with such objectives. In particular, we exploit the separability of AVM and CPM objectives by employing Lagrangian dual decomposition (LDD). (c) Finally, we demonstrate the utility of the LDD approach on a benchmark problem from the literature.

[1]  Michael H. Bowling,et al.  Tractable Objectives for Robust Policy Optimization , 2012, NIPS.

[2]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[3]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[4]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[6]  Stephen P. Boyd,et al.  Notes on Decomposition Methods , 2008 .

[7]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[8]  Patrick Jaillet,et al.  Regret based Robust Solutions for Uncertain Markov Decision Processes , 2013, NIPS.

[9]  Shie Mannor,et al.  Parametric regret in uncertain Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[10]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[11]  Craig Boutilier,et al.  Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.

[12]  David Hsu,et al.  Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.

[13]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[14]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[15]  Andrew Y. Ng,et al.  Solving Uncertain Markov Decision Processes , 2001 .

[16]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[17]  Fabio Gagliardi Cozman,et al.  Planning under Risk and Knightian Uncertainty , 2007, IJCAI.

[18]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[19]  David Barber,et al.  Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes , 2011, ECML/PKDD.