Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the decision-making process is generally not transparent. The lack of interpretability hinders the applicability in safety-critical scenarios. While several methods have attempted to interpret vision-based RL, most come without detailed explanation for the agent's behaviour. In this paper, we propose a self-supervised interpretable framework, which can discover causal features to enable easy interpretation of RL even for non-experts. Specifically, a self-supervised interpretable network is employed to produce fine-grained masks for highlighting task-relevant information, which constitutes most evidence for the agent's decisions. We verify and evaluate our method on several Atari-2600 games and Duckietown, which is a challenging self-driving car simulator environment. The results show that our method renders causal explanations and empirical evidences about how the agent makes decisions and why the agent performs well or badly. Overall, our method provides valuable insight into the decision-making process of RL. In addition, our method does not use any external labelled data, and thus demonstrates the possibility to learn high-quality mask through a self-supervised manner, which may shed light on new paradigms for label-free vision learning such as self-supervised segmentation and detection.

[1]  Byoung-Tak Zhang,et al.  Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.

[2]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[3]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Vladimir Aliev,et al.  Free-Lunch Saliency via Attention in Atari Agents , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[6]  Zhao Yang,et al.  Learn to Interpret Atari Agents , 2018, ArXiv.

[7]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Cheng Wu,et al.  High-Level Tracking of Autonomous Underwater Vehicles Based on Pseudo Averaged Q-Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[12]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[13]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[14]  L. Shapley A Value for n-person Games , 1988 .

[15]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[16]  Cheng Wu,et al.  Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning , 2019, IJCAI.

[17]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[18]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Katia P. Sycara,et al.  Towards Better Interpretability in Deep Q-Networks , 2018, AAAI.

[21]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[22]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[23]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[25]  Anton van den Hengel,et al.  Reinforcement Learning with Attention that Works: A Self-Supervised Approach , 2019, ICONIP.

[26]  Mark A. Neerincx,et al.  Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences , 2018, IJCAI 2018.

[27]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Liang Lin,et al.  Interpretable Visual Question Answering by Reasoning on Dependency Trees , 2019, IEEE transactions on pattern analysis and machine intelligence.

[30]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[31]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  M. Land Vision, eye movements, and natural behavior , 2009, Visual Neuroscience.

[37]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[38]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[39]  Hui Wu,et al.  Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning , 2019, NeurIPS.

[40]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[41]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[42]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[43]  Xilin Chen,et al.  What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[45]  Shie Mannor,et al.  Learning Embedded Maps of Markov Processes , 2001, ICML.

[46]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[47]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[48]  Cheng Wu,et al.  Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Steven C. H. Hoi,et al.  Paying Attention to Video Object Pattern Understanding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Nicholas Mattei,et al.  A Natural Language Argumentation Interface for Explanation Generation in Markov Decision Processes , 2011, ExaCt.

[51]  Nasser Mozayani,et al.  Learning to predict where to look in interactive environments using deep recurrent q-learning , 2016, ArXiv.

[52]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[53]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[54]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Bradley Hayes,et al.  Improving Robot Controller Transparency Through Autonomous Policy Explanation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[56]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[57]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[58]  Erik Talvitie,et al.  Policy Tree: Adaptive Representation for Policy Gradient , 2015, AAAI.

[59]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[60]  Andrea Vedaldi,et al.  Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).