Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators

As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions. A significant hurdle to allowing for such explanatory dialogue could be the vocabulary mismatch between the user and the AI system. This paper introduces methods for providing contrastive explanations in terms of user-specified concepts for sequential decision-making settings where the system's model of the task may be best represented as a blackbox simulator. We do this by building partial symbolic models of the task that can be leveraged to answer the user queries. We empirically test these methods on a popular Atari game (Montezuma's Revenge) and modified versions of Sokoban (a well known planning benchmark) and report the results of user studies to evaluate whether people find explanations generated in this form useful.

[1]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[2]  Mark A. Neerincx,et al.  Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences , 2018, IJCAI 2018.

[3]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[4]  Andrew Anderson,et al.  Explaining Reinforcement Learning to Mere Mortals: An Empirical Study , 2019, IJCAI.

[5]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[6]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[7]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[8]  Thomas Keller,et al.  Abstractions for Planning with State-Dependent Action Costs , 2016, ICAPS.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Subbarao Kambhampati,et al.  The Emerging Landscape of Explainable Automated Planning & Decision Making , 2020, IJCAI.

[14]  Jonathan Schaeffer,et al.  Using Abstraction for Planning in Sokoban , 2002, Computers and Games.

[15]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[16]  Amit Dhurandhar,et al.  Generating Contrastive Explanations with Monotonic Attribute Functions , 2019, ArXiv.

[17]  Michael Winikoff,et al.  Debugging Agent Programs with Why?: Questions , 2017, AAMAS.

[18]  Subbarao Kambhampati,et al.  Why Can't You Do That HAL? Explaining Unsolvability of Planning Tasks , 2019, IJCAI.

[19]  Bradley Hayes,et al.  Improving Robot Controller Transparency Through Autonomous Policy Explanation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[20]  Judith Hylton SAFE: , 1993 .

[21]  Yu Zhang,et al.  Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy , 2017, IJCAI.

[22]  Brendan Juba,et al.  Efficient, Safe, and Probably Approximately Complete Learning of Action Models , 2017, IJCAI.

[23]  Subbarao Kambhampati,et al.  Hierarchical Expertise Level Modeling for User Specific Contrastive Explanations , 2018, IJCAI.

[24]  Pat Langley,et al.  Varieties of Explainable Agency , 2019 .

[25]  Jaime G. Carbonell,et al.  Learning by experimentation: the operator refinement method , 1990 .