Solving K-MDPs

Markov Decision Processes (MDPs) are employed to model sequential decision-making problems under uncertainty. Traditionally, algorithms to solve MDPs have focused on solving large state or action spaces. With increasing applications of MDPs to human-operated domains such as conservation of biodiversity and health, developing easy-to-interpret solutions is of paramount importance to increase uptake of MDP policies. Here, we define the problem of solving K-MDPs, i.e., given an original MDP and a constraint on the number of states (K), generate a reduced state space MDP that minimizes the difference between the original optimal MDP value function and the reduced optimal K-MDP value function. Building on existing non-transitive and transitive approximate state abstraction functions, we propose a family of three algorithms based on binary search with sub-optimality bounded polynomially in a precision parameter: ϕQ*eK-MDP-ILP, ϕQ*dK-MDP and ϕa*dK-MDP. We compare these algorithms to a greedy algorithm (ϕQ*e Greedy K-MDP) and clustering approach (k-means++ K-MDP). On randomly generated MDPs and two computational sustainability MDPs, ϕa*dK-MDP outperformed all algorithms when it could find a feasible solution. While numerous state abstraction problems have been proposed in the literature, this is the first time that the general problem of solving K-MDPs is suggested. We hope that our work will generate future research aiming at increasing the interpretability of MDP policies in human-operated domains.

[1]  Marie-Josée Cros,et al.  MDPtoolbox: a multi-platform toolbox to solve stochastic dynamic programming problems , 2014 .

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[5]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[6]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[7]  I. Chades,et al.  Setting Realistic Recovery Targets for Two Interacting Endangered Species, Sea Otter and Northern Abalone , 2012, Conservation biology : the journal of the Society for Conservation Biology.

[8]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[9]  Marek Petrik,et al.  Interpretable Policies for Dynamic Product Recommendations , 2016, UAI.

[10]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[11]  Lucile Marescot,et al.  Complex decisions made simple: a primer on stochastic dynamic programming , 2013 .

[12]  Thomas G. Dietterich,et al.  Three New Algorithms to Solve N-POMDPs , 2017, AAAI.

[13]  Cynthia Rudin,et al.  Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[14]  Peng Wei,et al.  Explainable Deterministic MDPs , 2018, ArXiv.

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[17]  Subbarao Kambhampati,et al.  Plan Explanations as Model Reconciliation , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Ronald D. Dutton,et al.  On clique covers and independence numbers of graphs , 1983, Discret. Math..

[19]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[20]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[21]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[22]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[23]  Joelle Pineau,et al.  Policy-contingent abstraction for robust robot control , 2002, UAI.

[24]  M. Littman,et al.  Toward Good Abstractions for Lifelong Learning , 2017 .

[25]  Tim Miller,et al.  Model-based contrastive explanations for explainable planning , 2019 .

[26]  Scott Sanner,et al.  Bounded Approximate Symbolic Dynamic Programming for Hybrid MDPs , 2013, UAI.

[27]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[28]  Martin Péron,et al.  Selecting simultaneous actions of different durations to optimally manage an ecological network , 2017 .

[29]  Thomas G. Dietterich,et al.  α-min: A Compact Approximate Solver For Finite-Horizon POMDPs , 2015, IJCAI.

[30]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[31]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[32]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.