Self-Supervised Discovering of Causal Features: Towards Interpretable Reinforcement Learning

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent's decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. In this paper, we propose a self-supervised interpretable framework, which employs a self-supervised interpretable network (SSINet) to discover and locate fine-grained causal features that constitute most evidence for the agent's decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown. The results show that our method renders causal explanations and empirical evidences about how the agent makes decisions and why the agent performs well or badly. Moreover, our method is a flexible explanatory module that can be applied to most vision-based RL agents. Overall, our method provides valuable insight into interpretable vision-based RL.

[1]  Erik Talvitie,et al.  Policy Tree: Adaptive Representation for Policy Gradient , 2015, AAAI.

[2]  Katia P. Sycara,et al.  Towards Better Interpretability in Deep Q-Networks , 2018, AAAI.

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[5]  Nasser Mozayani,et al.  Learning to predict where to look in interactive environments using deep recurrent q-learning , 2016, ArXiv.

[6]  Vladimir Aliev,et al.  Free-Lunch Saliency via Attention in Atari Agents , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[7]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[8]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  M. Land Vision, eye movements, and natural behavior , 2009, Visual Neuroscience.

[11]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[12]  Mark A. Neerincx,et al.  Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences , 2018, IJCAI 2018.

[13]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[14]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[15]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[16]  Steven C. H. Hoi,et al.  Paying Attention to Video Object Pattern Understanding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[20]  Anton van den Hengel,et al.  Reinforcement Learning with Attention that Works: A Self-Supervised Approach , 2019, ICONIP.

[21]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[22]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[24]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Byoung-Tak Zhang,et al.  Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.

[27]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[28]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Liang Lin,et al.  Interpretable Visual Question Answering by Reasoning on Dependency Trees , 2019, IEEE transactions on pattern analysis and machine intelligence.

[30]  Zhao Yang,et al.  Learn to Interpret Atari Agents , 2018, ArXiv.

[31]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[33]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[34]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Sergey Levine,et al.  Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[37]  Nicholay Topin,et al.  Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy , 2019, ArXiv.

[38]  Shie Mannor,et al.  Learning Embedded Maps of Markov Processes , 2001, ICML.

[39]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[40]  L. Shapley A Value for n-person Games , 1988 .

[41]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[43]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[44]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[45]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[46]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[48]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[49]  Bradley Hayes,et al.  Improving Robot Controller Transparency Through Autonomous Policy Explanation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[50]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[51]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[52]  Nicholas Mattei,et al.  A Natural Language Argumentation Interface for Explanation Generation in Markov Decision Processes , 2011, ExaCt.

[53]  Xilin Chen,et al.  What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[56]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[57]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[58]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  F. Elizalde,et al.  Policy Explanation in Factored Markov Decision Processes , 2008 .

[60]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).