Self-Supervised Discovering of Causal Features: Towards Interpretable Reinforcement Learning

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent's decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. In this paper, we propose a self-supervised interpretable framework, which employs a self-supervised interpretable network (SSINet) to discover and locate fine-grained causal features that constitute most evidence for the agent's decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown. The results show that our method renders causal explanations and empirical evidences about how the agent makes decisions and why the agent performs well or badly. Moreover, our method is a flexible explanatory module that can be applied to most vision-based RL agents. Overall, our method provides valuable insight into interpretable vision-based RL.

[1]  Sergey Levine,et al.  Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[3]  Zhao Yang,et al.  Learn to Interpret Atari Agents , 2018, ArXiv.

[4]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[6]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[7]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[8]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Mark A. Neerincx,et al.  Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences , 2018, IJCAI 2018.

[11]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[13]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Liang Lin,et al.  Interpretable Visual Question Answering by Reasoning on Dependency Trees , 2019, IEEE transactions on pattern analysis and machine intelligence.

[18]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[19]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[23]  Katia P. Sycara,et al.  Towards Better Interpretability in Deep Q-Networks , 2018, AAAI.

[24]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[25]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[26]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shie Mannor,et al.  Learning Embedded Maps of Markov Processes , 2001, ICML.

[28]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[29]  Byoung-Tak Zhang,et al.  Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.

[30]  M. Land Vision, eye movements, and natural behavior , 2009, Visual Neuroscience.

[31]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[32]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[33]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[34]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[35]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[36]  Erik Talvitie,et al.  Policy Tree: Adaptive Representation for Policy Gradient , 2015, AAAI.

[37]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[38]  Steven C. H. Hoi,et al.  Paying Attention to Video Object Pattern Understanding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Nicholay Topin,et al.  Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy , 2019, ArXiv.

[40]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[41]  Anton van den Hengel,et al.  Reinforcement Learning with Attention that Works: A Self-Supervised Approach , 2019, ICONIP.

[42]  Nicholas Mattei,et al.  A Natural Language Argumentation Interface for Explanation Generation in Markov Decision Processes , 2011, ExaCt.

[43]  Nasser Mozayani,et al.  Learning to predict where to look in interactive environments using deep recurrent q-learning , 2016, ArXiv.

[44]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[45]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Bradley Hayes,et al.  Improving Robot Controller Transparency Through Autonomous Policy Explanation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[47]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[48]  Xilin Chen,et al.  What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[50]  L. Shapley A Value for n-person Games , 1988 .

[51]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[52]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Vladimir Aliev,et al.  Free-Lunch Saliency via Attention in Atari Agents , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[54]  F. Elizalde,et al.  Policy Explanation in Factored Markov Decision Processes , 2008 .

[55]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[56]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[57]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[59]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[60]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).