Minimal Sufficient Explanations for Factored Markov Decision Processes

Explaining policies of Markov Decision Processes (MDPs) is complicated due to their probabilistic and sequential nature. We present a technique to explain policies for factored MDP by populating a set of domain-independent templates. We also present a mechanism to determine a minimal set of templates that, viewed together, completely justify the policy. Our explanations can be generated automatically at run-time with no additional effort required from the MDP designer. We demonstrate our technique using the problems of advising undergraduate students in their course selection and assisting people with dementia in completing the task of handwashing. We also evaluate our explanations for course-advising through a user study involving students.

[1]  William R. Swartout,et al.  XPLAIN: A System for Creating and Explaining Expert Consulting Programs , 1983, Artif. Intell..

[2]  William J. Clancey,et al.  The Epistemology of a Rule-Based Expert System - A Framework for Explanation , 1981, Artif. Intell..

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Joseph Y. Halpern,et al.  Defining Explanation in Probabilistic Systems , 1997, UAI.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Patrick Baudisch,et al.  Interacting with recommender systems , 1999, CHI EA '99.

[7]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[8]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[9]  Carmen Lacave,et al.  A review of explanation methods for Bayesian networks , 2002, The Knowledge Engineering Review.

[10]  Alex Dekhtyar,et al.  POET: The Online Preference Elicitation Tool ∗ , 2002, AAAI 2002.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[13]  Luis Enrique Sucar,et al.  An MDP Approach for Explanation Generation , 2007, ExaCt.

[14]  Carmen Lacave,et al.  Explanation of Bayesian Networks and Influence Diagrams in Elvira , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Judith Masthoff,et al.  A Survey of Explanations in Recommender Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[16]  Deborah L. McGuinness,et al.  Explaining Task Processing in Cognitive Assistants that Learn , 2007, FLAIRS.

[17]  Jesse Hoey,et al.  Assisting persons with dementia during handwashing using a partially observable Markov decision process. , 2007, ICVS 2007.

[18]  Pascal Poupart,et al.  Explaining recommendations generated by MDPs , 2008, ExaCt.

[19]  Gerhard Friedrich,et al.  Explanations in recommender systems , 2010 .

[20]  U. Rieder,et al.  Markov Decision Processes , 2010 .