Extracting Relational Explanations From Deep Neural Networks: A Survey From a Neural-Symbolic Perspective

The term “explainable AI” refers to the goal of producing artificially intelligent agents that are capable of providing explanations for their decisions. Some models (e.g., rule-based systems) are designed to be explainable, while others are less explicit “black boxes” for which their reasoning remains a mystery. One example of the latter is the neural network, and over the past few decades, researchers in the field of neural-symbolic integration (NSI) have sought to extract relational knowledge from such networks. Extraction from deep neural networks, however, has remained a challenge until recent years in which many methods of extracting distinct, salient features from input or hidden feature spaces of deep neural networks have been proposed. Furthermore, methods of identifying relationships between these features have also emerged. This article presents examples of old and new developments in extracting relational explanations in order to argue that the latter have analogies in the former and, as such, can be described in terms of long-established taxonomies and frameworks presented in early neural-symbolic literature. We also outline potential future research directions that come to light from this refreshed perspective.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Franco Turini,et al.  Open the Black Box Data-Driven Explanation of Black Box Decision Systems , 2018, ArXiv.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[5]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[7]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[8]  James L. McClelland,et al.  Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks , 2005, Machine Learning.

[9]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[10]  Son N. Tran,et al.  Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kai-Uwe Kühnberger,et al.  Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[13]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[14]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[15]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Computation.

[18]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[19]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[20]  Oliver Ray,et al.  A Neural Network Approach for First-Order Abductive Inference , 2009, NeSy.

[21]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[22]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[23]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[24]  Peter Tiño,et al.  Extracting stochastic machines from recurrent neural networks trained on complex symbolic sequences , 1997, Proceedings of 1st International Conference on Conventional and Knowledge Based Intelligent Electronic Systems. KES '97.

[25]  Alberto Sanfeliu,et al.  Experimental assessment of connectionest regular inference from positive and negative examples , 1997 .

[26]  Harukazu Igarashi,et al.  Design and Application of Hybrid Intelligent Systems , 2003 .

[27]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[28]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[29]  Peter Tiño,et al.  Extracting finite-state representations from recurrent neural networks trained on chaotic symbolic sequences , 1999, IEEE Trans. Neural Networks.

[30]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[31]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  Misha Denil,et al.  Extraction of Salient Sentences from Labelled Documents , 2014, ArXiv.

[34]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[35]  Dov M. Gabbay,et al.  Connectionist modal logic: Representing modalities in neural networks , 2007, Theor. Comput. Sci..

[36]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[37]  Quanshi Zhang,et al.  Interpreting CNNs via Decision Trees , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Qi Wu,et al.  The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ewan Dunbar,et al.  RNNs Implicitly Implement Tensor Product Representations , 2018, ICLR.

[40]  Dov M. Gabbay,et al.  Dimensions of Neural-symbolic Integration - A Structured Survey , 2005, We Will Show Them!.

[41]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[42]  Quanshi Zhang,et al.  Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[45]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[46]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[47]  Antony Galton,et al.  Artificial Development of Biologically Plausible Neural-Symbolic Networks , 2013, Cognitive Computation.

[48]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[49]  Artur S. d'Avila Garcez,et al.  Reasoning about Time and Knowledge in Neural Symbolic Learning Systems , 2003, NIPS.

[50]  Pascal Hitzler,et al.  Relating Input Concepts to Convolutional Neural Network Decisions , 2017, ArXiv.

[51]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[52]  Shi Feng,et al.  Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation , 2019, ICML.

[53]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[54]  Guido Boella,et al.  Embedding Normative Reasoning into Neural Symbolic Systems , 2011, NeSy.

[55]  Steffen Hölldobler,et al.  Towards a New Massively Parallel Computational Model for Logic Programming , 1994 .

[56]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[57]  Quanshi Zhang,et al.  Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning , 2016, AAAI.

[58]  Tillman Weyde,et al.  The Need for Knowledge Extraction: Understanding Harmful Gambling Behavior with Neural Networks , 2016, ECAI.

[59]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[60]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[61]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[62]  James L. McClelland,et al.  Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[63]  Krysia Broda,et al.  Symbolic knowledge extraction from trained neural networks: A sound approach , 2001, Artif. Intell..

[64]  T. Kathirvalavakumar,et al.  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems , 2011, Neural Processing Letters.

[65]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[67]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[68]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Alberto Sanfeliu,et al.  Active Grammatical Inference: A New Learning Methodology , 1994 .

[70]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[71]  Virginia R. de Sa,et al.  Learning Distributed Representations of Symbolic Structure Using Binding and Unbinding Operations , 2018, ArXiv.

[72]  Yash Goyal,et al.  Towards Transparent AI Systems: Interpreting Visual Question Answering Models , 2016, 1608.08974.

[73]  Mariusz Bojarski,et al.  VisualBackProp: efficient visualization of CNNs , 2018 .

[74]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[75]  L. Shastri,et al.  From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony , 1993, Behavioral and Brain Sciences.

[76]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[77]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[78]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[79]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[80]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[81]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.