Natural Language Descriptions of Deep Visual Features

Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respond to individual concept categories like colors, textures, and object classes. But these techniques are limited in scope, labeling only a small subset of neurons and behaviors in any network. Is a richer characterization of neuron-level computation possible? We introduce a procedure (called MILAN, for mutual-information-guided linguistic annotation of neurons) that automatically labels neurons with open-ended, compositional, natural language descriptions. Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active. MILAN produces fine-grained descriptions that capture categorical, relational, and logical structure in learned features. These descriptions obtain high agreement with human-generated feature descriptions across a diverse set of model architectures and tasks, and can aid in understanding and controlling learned models. We highlight three applications of natural language neuron descriptions. First, we use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. Second, we use MILAN for auditing, surfacing neurons sensitive to protected categories like race and gender in models trained on datasets intended to obscure these features. Finally, we use MILAN for editing, improving robustness in an image classifier by deleting neurons sensitive to text features spuriously correlated with class labels.

[1]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[5]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[6]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[7]  Hanna M. Wallach,et al.  A Human-Centered Agenda for Intelligible Machine Learning , 2021 .

[8]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[9]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[10]  Dan Klein,et al.  Analogs of Linguistic Structure in Deep Representations , 2017, EMNLP.

[11]  Jacob Andreas,et al.  Toward a Visual Concept Vocabulary for GAN Latent Space , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Anca D. Dragan,et al.  Translating Neuralese , 2017, ACL.

[21]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[22]  Olga Russakovsky,et al.  Towards Unique and Informative Captioning of Images , 2020, ECCV.

[23]  Jacob Andreas,et al.  Compositional Explanations of Neurons , 2020, NeurIPS.

[24]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[25]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[29]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[30]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[31]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[32]  Alec Radford,et al.  Multimodal Neurons in Artificial Neural Networks , 2021 .

[33]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[34]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[35]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[38]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[39]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[41]  Martin Wattenberg,et al.  An Interpretability Illusion for BERT , 2021, ArXiv.

[42]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[43]  Colin Raffel,et al.  WT5?! Training Text-to-Text Models to Explain their Predictions , 2020, ArXiv.

[44]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[45]  Andrea Vedaldi,et al.  Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning , 2020, NeurIPS.

[46]  Friedrich Rippmann,et al.  Interpretable Deep Learning in Drug Discovery , 2019, Explainable AI.

[47]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[48]  Li Fei-Fei,et al.  A Study of Face Obfuscation in ImageNet , 2021, ICML.

[49]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[50]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[53]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[54]  Ian Goodfellow,et al.  Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming , 2020, NeurIPS.

[55]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[56]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[57]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[58]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[59]  Trevor Darrell,et al.  Grounding Visual Explanations , 2018, ECCV.