Towards falsifiable interpretability research

Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpretability research suffers from an over-reliance on intuition-based approaches that risk-and in some cases have caused-illusory progress and misleading conclusions. We identify a set of limitations that we argue impede meaningful progress in interpretability research, and examine two popular classes of interpretability methods-saliency and single-neuron-based approaches-that serve as case studies for how overreliance on intuition and lack of falsifiability can undermine interpretability research. To address these concerns, we propose a strategy to address these impediments in the form of a framework for strongly falsifiable interpretability research. We encourage researchers to use their intuitions as a starting point to develop and test clear, falsifiable hypotheses, and hope that our framework yields robust, evidence-based interpretability methods that generate meaningful advances in our understanding of DNNs.

[1]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[2]  Andrew C. Gallagher,et al.  Which Edges Matter? , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[3]  W. Geisler Visual perception and the statistical properties of natural scenes. , 2008, Annual review of psychology.

[4]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[5]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[6]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[7]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[8]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[10]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[11]  Katia P. Sycara,et al.  Transparency and Explanation in Deep Reinforcement Learning Neural Networks , 2018, AIES.

[12]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[13]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[14]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[15]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[16]  Devi Parikh,et al.  It Takes Two to Tango: Towards Theory of AI's Mind , 2017, ArXiv.

[17]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[18]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Masayoshi Mase,et al.  Attribution-based Salience Method towards Interpretable Reinforcement Learning , 2020, AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.

[20]  Vladimir Aliev,et al.  Free-Lunch Saliency via Attention in Atari Agents , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  Enrico Costanza,et al.  Evaluating saliency map explanations for convolutional neural networks: a user study , 2020, IUI.

[22]  Chirag Agarwal,et al.  Estimating Example Difficulty using Variance of Gradients , 2020, ArXiv.

[23]  Richard K. G. Do,et al.  Convolutional neural networks: an overview and application in radiology , 2018, Insights into Imaging.

[24]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[25]  Ronald M. Summers,et al.  TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[27]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[28]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[29]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.

[30]  Guy Amit,et al.  Hybrid Mass Detection in Breast MRI Combining Unsupervised Saliency Analysis and Deep Learning , 2017, MICCAI.

[31]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[32]  Kaleigh Clary,et al.  Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning , 2020, ICLR.

[33]  Filip Karlo Dosilovic,et al.  Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[37]  H. Barlow,et al.  Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[38]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[39]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ari S. Morcos,et al.  Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs , 2020, ICLR.

[41]  Nicholas Carlini,et al.  Prototypical Examples in Deep Learning: Metrics, Characteristics, and Utility , 2018 .

[42]  Mukund Sundararajan,et al.  How Important Is a Neuron? , 2018, ICLR.

[43]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[44]  Adam Roegiest,et al.  On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron , 2019, ECIR.

[45]  Gunhee Kim,et al.  Discovery of Natural Language Concepts in Individual Units of CNNs , 2019, ICLR.

[46]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[47]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[48]  Satyananda Kashyap,et al.  Age prediction using a large chest x-ray dataset , 2019, Medical Imaging.

[49]  Ari S. Morcos,et al.  On the relationship between class selectivity, dimensionality, and robustness , 2020, ArXiv.

[50]  Luís A. Alexandre,et al.  Understanding trained CNNs by indexing neuron selectivity , 2017, Pattern Recognit. Lett..

[51]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[52]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Martin Wattenberg,et al.  TCAV: Relative concept importance testing with Linear Concept Activation Vectors , 2018 .

[54]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[55]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[56]  Valero Laparra,et al.  Eigen-Distortions of Hierarchical Representations , 2017, NIPS.

[57]  Leif D. Nelson,et al.  Data from Paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant” , 2014 .

[58]  E. Tufte,et al.  The visual display of quantitative information , 1984, The SAGE Encyclopedia of Research Design.

[59]  Alan Yuille,et al.  Unsupervised learning of object semantic parts from internal states of CNNs by population encoding , 2015, 1511.06855.

[60]  Tobias Meisen,et al.  Ablation Studies in Artificial Neural Networks , 2019, ArXiv.

[61]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[62]  Michela Paganini,et al.  The Scientific Method in the Science of Machine Learning , 2019, ArXiv.

[63]  Dwarikanath Mahapatra,et al.  Retinal Image Quality Classification Using Saliency Maps and CNNs , 2016, MLMI@MICCAI.

[64]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[65]  Nick Cammarata,et al.  Zoom In: An Introduction to Circuits , 2020 .

[66]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[67]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[68]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[69]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[70]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[71]  Lee Lacy,et al.  Defense Advanced Research Projects Agency (DARPA) Agent Markup Language Computer Aided Knowledge Acquisition , 2005 .

[72]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[73]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[74]  J. Brobeck The Integrative Action of the Nervous System , 1948, The Yale Journal of Biology and Medicine.

[75]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[76]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[77]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[78]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[80]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[81]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[82]  Hang Su,et al.  Learning Reliable Visual Saliency For Model Explanations , 2020, IEEE Transactions on Multimedia.

[83]  Jason Yosinski,et al.  Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.

[84]  Andrea Vedaldi,et al.  Net2Vec: Quantifying and Explaining How Concepts are Encoded by Filters in Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[85]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[87]  C. Sherrington Integrative Action of the Nervous System , 1907 .

[88]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[89]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[90]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[91]  Klaus-Robert Müller,et al.  Evaluating Recurrent Neural Network Explanations , 2019, BlackboxNLP@ACL.

[92]  Jae Duk Seo Visualizing Uncertainty and Saliency Maps of Deep Convolutional Neural Networks for Medical Imaging Applications , 2019, ArXiv.

[93]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[94]  Ran Gilad-Bachrach,et al.  Debugging Machine Learning Models , 2016 .

[95]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[96]  Bernhard C. Geiger,et al.  Understanding Individual Neuron Importance Using Information Theory , 2018, ArXiv.

[97]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[98]  Ziheng Jiang,et al.  Characterizing Structural Regularities of Labeled Data in Overparameterized Models , 2020 .

[99]  Edward Rolf Tufte,et al.  The visual display of quantitative information , 1985 .

[100]  Bret Victor,et al.  Humane representation of thought: a trail map for the 21st century , 2014, UIST.

[101]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[102]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[103]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[104]  E. Adrian,et al.  The impulses produced by sensory nerve endings , 1926, The Journal of physiology.

[105]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.