Explainability in Deep Reinforcement Learning

A large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems.

[1]  Geraud Nangue Tasse,et al.  A Boolean Task Algebra for Reinforcement Learning , 2020, NeurIPS.

[2]  Richard J. Duro,et al.  Open-Ended Learning: A Conceptual Framework Based on Representational Redescription , 2018, Front. Neurorobot..

[3]  Gary Klein,et al.  Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[4]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[5]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[6]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[7]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Jonathan P. How,et al.  Safe Reinforcement Learning With Model Uncertainty Estimates , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[10]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[11]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[12]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[13]  David Filliat,et al.  Deep unsupervised state representation learning with robotic priors: a robustness analysis , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[14]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15]  David Filliat,et al.  Don't forget, there is more than forgetting: new metrics for Continual Learning , 2018, ArXiv.

[16]  Dongqi Han,et al.  Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN , 2019, ArXiv.

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[19]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[20]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[21]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[22]  A. Aldo Faisal,et al.  Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[24]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[25]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[26]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[27]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[28]  Misha Denil,et al.  Programmable Agents , 2017, ArXiv.

[29]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[30]  Hiroshi Kawano,et al.  Hierarchical sub-task decomposition for reinforcement learning of multi-robot delivery mission , 2013, 2013 IEEE International Conference on Robotics and Automation.

[31]  Yunjie Gu,et al.  Shapley Q-value: A Local Reward Approach to Solve Global Reward Games , 2020, AAAI.

[32]  Przemyslaw Biecek,et al.  Explanations of model predictions with live and breakDown packages , 2018, R J..

[33]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  David Filliat,et al.  S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[35]  Rama Chellappa,et al.  Learning Without Memorizing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part I: Causes , 2000, The British Journal for the Philosophy of Science.

[37]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[38]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Mukund Sundararajan,et al.  The many Shapley values for model explanation , 2019, ICML.

[40]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[41]  Yuchen Zhao,et al.  Symbolic-Based Recognition of Contact States for Learning Assembly Skills , 2019, Front. Robot. AI.

[42]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[43]  Joseph Y. Halpern,et al.  Causes and explanations: A structural-model approach , 2000 .

[44]  Luc De Raedt,et al.  Neural-Symbolic Learning and Reasoning: Contributions and Challenges , 2015, AAAI Spring Symposia.

[45]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[46]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[47]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[48]  Quanshi Zhang,et al.  Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning , 2016, AAAI.

[49]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[51]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[52]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[53]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[54]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[55]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[56]  Stefano Soatto,et al.  A Separation Principle for Control in the Age of Deep Learning , 2017, Annual Review of Control, Robotics, and Autonomous Systems.

[57]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[58]  Trevor Darrell,et al.  Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[60]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[61]  David Filliat,et al.  Continual Learning for Robotics , 2019, Inf. Fusion.

[62]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[63]  Francesco Rea,et al.  Towards Transparency of TD-RL Robotic Systems with a Human Teacher , 2020, ArXiv.

[64]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[65]  Eugene Kharitonov,et al.  Anti-efficient encoding in emergent communication , 2019, NeurIPS.

[66]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[68]  Vicky Charisi,et al.  Should artificial agents ask for help in human-robot collaborative problem-solving? , 2020, ArXiv.

[69]  O. Pietquin,et al.  Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following , 2019, ViGIL@NeurIPS.

[70]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[71]  Jan Peters,et al.  Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[72]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[73]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[74]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[75]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[76]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[77]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[78]  David Filliat,et al.  Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2020, Inf. Fusion.

[79]  David Filliat,et al.  Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[80]  M. Gervasio,et al.  Interestingness Elements for Explainable Reinforcement Learning: Understanding Agents' Capabilities and Limitations , 2019, Artif. Intell..

[81]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[82]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[83]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[84]  Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks , 2019, Neural Networks.

[85]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[86]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[87]  Julian Togelius,et al.  Autoencoder-augmented neuroevolution for visual doom playing , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[88]  Quanshi Zhang,et al.  Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[89]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[90]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[91]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[92]  Gokhan Tur,et al.  Commonsense and Semantic-Guided Navigation through Language in Embodied Environment , 2019, ViGIL@NeurIPS.

[93]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[95]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[96]  Nando de Freitas,et al.  Learning Compositional Neural Programs with Recursive Tree Search and Planning , 2019, NeurIPS.

[97]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[98]  Eugene Kharitonov,et al.  Emergent Language Generalization and Acquisition Speed are not tied to Compositionality , 2020, BLACKBOXNLP.

[99]  Marco Baroni,et al.  Linguistic generalization and compositionality in modern artificial neural networks , 2019, Philosophical Transactions of the Royal Society B.

[100]  Eric Yeh,et al.  Interestingness Elements for Explainable Reinforcement Learning through Introspection , 2019, IUI Workshops.

[101]  Artur S. d'Avila Garcez,et al.  Towards Symbolic Reinforcement Learning with Common Sense , 2018, ArXiv.

[102]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[103]  Bo Li,et al.  Towards Interpretable R-CNN by Unfolding Latent Structures , 2017, 1711.05226.

[104]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[105]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[106]  Gerald Friedland,et al.  Efficient Saliency Maps for Explainable AI , 2019, ArXiv.

[107]  David Filliat,et al.  Symmetry-Based Disentangled Representation Learning requires Interaction with Environments , 2019, NeurIPS.

[108]  Alan Fern,et al.  Explainable Reinforcement Learning via Reward Decomposition , 2019 .

[109]  David W. Aha,et al.  DARPA's Explainable Artificial Intelligence (XAI) Program , 2019, AI Mag..

[110]  Trevor Darrell,et al.  Language-Conditioned Graph Networks for Relational Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[111]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[112]  Richard J. Duro,et al.  DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics , 2020, ArXiv.

[113]  Tom Eccles,et al.  Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[114]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[115]  Andreas Theodorou,et al.  Designing and implementing transparency for real time inspection of autonomous robots , 2017, Connect. Sci..

[116]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[117]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[118]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.