Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Current work in explainable reinforcement learning generally produces policies in the form of a decision tree over the state space. Such policies can be used for formal safety verification, agent behavior prediction, and manual inspection of important features. However, existing approaches fit a decision tree after training or use a custom learning procedure which is not compatible with new learning techniques, such as those which use neural networks. To address this limitation, we propose a novel Markov Decision Process (MDP) type for learning decision tree policies: Iterative Bounding MDPs (IBMDPs). An IBMDP is constructed around a base MDP so each IBMDP policy is guaranteed to correspond to a decision tree policy for the base MDP when using a method-agnostic masking procedure. Because of this decision tree equivalence, any function approximator can be used during training, including a neural network, while yielding a decision tree policy for the base MDP. We present the required masking procedure as well as a modified value update step which allows IBMDPs to be solved using existing algorithms. We apply this procedure to produce IBMDP variants of recent reinforcement learning methods. We empirically show the benefits of our approach by solving IBMDPs to produce decision tree policies for the base MDPs.

[1]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[2]  Katia P. Sycara,et al.  Transparency and Explanation in Deep Reinforcement Learning Neural Networks , 2018, AIES.

[3]  Alan Fern,et al.  Learning Finite State Representations of Recurrent Policy Networks , 2018, ICLR.

[4]  K. Tuyls,et al.  Reinforcement Learning in Large State Spaces , 2002, RoboCup.

[5]  Mircea Preda,et al.  Adaptive building of decision trees by reinforcement learning , 2007 .

[6]  Bradley Hayes,et al.  Explanation-Based Reward Coaching to Improve Human Performance via Reinforcement Learning , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[8]  Andrew Anderson,et al.  Explaining Reinforcement Learning to Mere Mortals: An Empirical Study , 2019, IJCAI.

[9]  Nicholay Topin,et al.  Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy , 2019, ArXiv.

[10]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[11]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[12]  Larry D. Pyeatt Reinforcement Learning with Decision Trees , 2003, Applied Informatics.

[13]  Xing Xie,et al.  A Reinforcement Learning Framework for Explainable Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[14]  Yujin Tang,et al.  Neuroevolution of self-interpretable agents , 2020, GECCO.

[15]  Pravesh Ranchod,et al.  Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[16]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[18]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[19]  Sarfraz Khurshid,et al.  MoËT: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees , 2019, ArXiv.

[20]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[21]  Zhao Yang,et al.  Learn to Interpret Atari Agents , 2018, ArXiv.

[22]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[23]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[24]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[25]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[26]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[27]  Katia P. Sycara,et al.  Towards Better Interpretability in Deep Q-Networks , 2018, AAAI.

[28]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[29]  Peter Vamplew,et al.  Memory-Based Explainable Reinforcement Learning , 2019, Australasian Conference on Artificial Intelligence.

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[32]  Sung-Hyun Son,et al.  Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning , 2020, AISTATS.

[33]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[34]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[35]  Nicholas Mattei,et al.  A Natural Language Argumentation Interface for Explanation Generation in Markov Decision Processes , 2011, ExaCt.

[36]  Rishabh Singh,et al.  MoËT: Mixture of Expert Trees and its application to verifiable reinforcement learning , 2019, Neural Networks.

[37]  Brandon Houghton,et al.  Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning , 2019, Proceedings of Machine Learning Research.

[38]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  Thomas A. Runkler,et al.  Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies , 2016, Eng. Appl. Artif. Intell..

[40]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[41]  Liz Sonenberg,et al.  Distal Explanations for Explainable Reinforcement Learning Agents , 2020, ArXiv.

[42]  Subbarao Kambhampati,et al.  TLdR: Policy Summarization for Factored SSP Problems Using Temporal Abstractions , 2020, ICAPS.

[43]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[44]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[45]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[48]  Kaleigh Clary,et al.  Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL , 2019, ICLR 2020.

[49]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[50]  Manuela Veloso,et al.  Generation of Policy-Level Explanations for Reinforcement Learning , 2019, AAAI.

[51]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[52]  Elisabeth André,et al.  Enhancing Explainability of Deep Reinforcement Learning Through Selective Layer-Wise Relevance Propagation , 2019, KI.

[53]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[54]  Thomas A. Runkler,et al.  Interpretable Policies for Reinforcement Learning by Genetic Programming , 2017, Eng. Appl. Artif. Intell..

[55]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[56]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[57]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[58]  Peter A. Flach,et al.  Desiderata for Interpretability: Explaining Decision Tree Predictions with Counterfactuals , 2019, AAAI.

[59]  Alan Fern,et al.  Explainable Reinforcement Learning via Reward Decomposition , 2019 .

[60]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.