论文信息 - Solving K-MDPs

Solving K-MDPs

Markov Decision Processes (MDPs) are employed to model sequential decision-making problems under uncertainty. Traditionally, algorithms to solve MDPs have focused on solving large state or action spaces. With increasing applications of MDPs to human-operated domains such as conservation of biodiversity and health, developing easy-to-interpret solutions is of paramount importance to increase uptake of MDP policies. Here, we define the problem of solving K-MDPs, i.e., given an original MDP and a constraint on the number of states (K), generate a reduced state space MDP that minimizes the difference between the original optimal MDP value function and the reduced optimal K-MDP value function. Building on existing non-transitive and transitive approximate state abstraction functions, we propose a family of three algorithms based on binary search with sub-optimality bounded polynomially in a precision parameter: ϕQ*eK-MDP-ILP, ϕQ*dK-MDP and ϕa*dK-MDP. We compare these algorithms to a greedy algorithm (ϕQ*e Greedy K-MDP) and clustering approach (k-means++ K-MDP). On randomly generated MDPs and two computational sustainability MDPs, ϕa*dK-MDP outperformed all algorithms when it could find a feasible solution. While numerous state abstraction problems have been proposed in the literature, this is the first time that the general problem of solving K-MDPs is suggested. We hope that our work will generate future research aiming at increasing the interpretability of MDP policies in human-operated domains.

[1] Marie-Josée Cros,et al. MDPtoolbox: a multi-platform toolbox to solve stochastic dynamic programming problems , 2014 .

[2] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[4] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[5] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[6] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[7] I. Chades,et al. Setting Realistic Recovery Targets for Two Interacting Endangered Species, Sea Otter and Northern Abalone , 2012, Conservation biology : the journal of the Society for Conservation Biology.

[8] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[9] Marek Petrik,et al. Interpretable Policies for Dynamic Product Recommendations , 2016, UAI.

[10] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[11] Lucile Marescot,et al. Complex decisions made simple: a primer on stochastic dynamic programming , 2013 .

[12] Thomas G. Dietterich,et al. Three New Algorithms to Solve N-POMDPs , 2017, AAAI.

[13] Cynthia Rudin,et al. Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[14] Peng Wei,et al. Explainable Deterministic MDPs , 2018, ArXiv.

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[17] Subbarao Kambhampati,et al. Plan Explanations as Model Reconciliation , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18] Ronald D. Dutton,et al. On clique covers and independence numbers of graphs , 1983, Discret. Math..

[19] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[20] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[21] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[22] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[23] Joelle Pineau,et al. Policy-contingent abstraction for robust robot control , 2002, UAI.

[24] M. Littman,et al. Toward Good Abstractions for Lifelong Learning , 2017 .

[25] Tim Miller,et al. Model-based contrastive explanations for explainable planning , 2019 .

[26] Scott Sanner,et al. Bounded Approximate Symbolic Dynamic Programming for Hybrid MDPs , 2013, UAI.

[27] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[28] Martin Péron,et al. Selecting simultaneous actions of different durations to optimally manage an ecological network , 2017 .

[29] Thomas G. Dietterich,et al. α-min: A Compact Approximate Solver For Finite-Horizon POMDPs , 2015, IJCAI.

[30] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[31] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[32] Paulo J. G. Lisboa,et al. Making machine learning models interpretable , 2012, ESANN.