Policy Explanation in Factored Markov Decision Processes

In this paper we address the problem of explaining the recommendations returned by a Markov decision process (MDP) that is part of an intelligent assistant for operator training. When analyzing the explanations provided by human experts, we observed that they concentrated on the “most relevant variable”, i.e., the variable that in the current state of the system has the highest influence on the choice of the optimal action. We propose two heuristic rules for determining the most relevant variable based on a factored representation of an MDP. In the first one, we estimate the impact of each variable in the expected utility. The second rule evaluates the potential changes in the optimal action for each variable. We evaluated and compared each rule in the power plant domain, where we have a set of explanations, including the most relevant variable, given by a domain expert. Our experiments show a strong agreement between the variable selected by human experts and that selected by our method for a representative sample of states.

[1]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[2]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[3]  Michael P. Wellman Graphical inference in qualitative probabilistic networks , 1990, Networks.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[6]  Silja Renooij,et al.  Decision Making in Qualitative Influence Diagrams , 1998, FLAIRS.

[7]  Jürgen Herrmann,et al.  The role of explanations in an intelligent assistant system , 1998, Artif. Intell. Eng..

[8]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[9]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[10]  Carmen Lacave,et al.  Graphical Explanation in Bayesian Networks , 2000, ISMDA.

[11]  Marek J. Druzdzel,et al.  Explanation in Probabilistic Systems: Is It Feasible? Will It Work? , 2001 .

[12]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[13]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[14]  Concha Bielza,et al.  Optimal Decision Explanation by Extracting Regularity Patterns , 2003, SGAI Conf..

[15]  Luis Enrique Sucar,et al.  Solving Hybrid Markov Decision Processes , 2006, MICAI.

[16]  Carmen Lacave,et al.  Explanation of Bayesian Networks and Influence Diagrams in Elvira , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Pascal Poupart,et al.  Explaining recommendations generated by MDPs , 2008, ExaCt.