Editing a classifier by rewriting its prediction rules

We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules.1 Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[3]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[5]  Bernt Schiele,et al.  Adversarial Scene Editing: Automatic Object Removal from Weak Supervision , 2018, NeurIPS.

[6]  Tengyu Ma,et al.  Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[9]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[10]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[11]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[12]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[13]  Ekin D. Cubuk,et al.  Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation , 2019, ArXiv.

[14]  Aude Oliva,et al.  GANalyze: Toward Visual Definitions of Cognitive Image Properties , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[20]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[21]  Aleksander Madry,et al.  From ImageNet to Image Classification: Contextualizing Progress on Benchmarks , 2020, ICML.

[22]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[23]  Dani Lischinski,et al.  StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Zhifang Sui,et al.  Knowledge Neurons in Pretrained Transformers , 2021, ArXiv.

[25]  Chelsea Finn,et al.  Fast Model Editing at Scale , 2021, ArXiv.

[26]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[27]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[28]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[29]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[30]  Bolei Zhou,et al.  Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[31]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[32]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[33]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[34]  Simon Hessner,et al.  Image Style Transfer using Convolutional Neural Networks , 2018 .

[35]  David Bau,et al.  Rewriting a Deep Generative Model , 2020, ECCV.

[36]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[39]  Balaji Lakshminarayanan,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[40]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[41]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[42]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[43]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[44]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  Bernt Schiele,et al.  Interpretability Beyond Classification Output: Semantic Bottleneck Networks , 2019, ArXiv.

[47]  Eduardo Valle,et al.  Debiasing Skin Lesion Datasets and Models? Not So Fast , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[50]  Mario Fritz,et al.  Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jure Leskovec,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2021, ICML.

[52]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[53]  Honglak Lee,et al.  Exploring the structure of a real-time, arbitrary neural artistic stylization network , 2017, BMVC.

[54]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[55]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Tinne Tuytelaars,et al.  A Testbed for Cross-Dataset Analysis , 2014, ECCV Workshops.

[57]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[58]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[59]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[60]  Judea Pearl,et al.  Causal Inference , 2010 .

[61]  Jaakko Lehtinen,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[62]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Percy Liang,et al.  Distributionally Robust Language Modeling , 2019, EMNLP.

[64]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[66]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[67]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[68]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[69]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[70]  Pascal Frossard,et al.  Manitest: Are classifiers really invariant? , 2015, BMVC.

[71]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[72]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[73]  Nicola De Cao,et al.  Editing Factual Knowledge in Language Models , 2021, EMNLP.

[74]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[75]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations , 2018, 1807.01697.

[76]  Jiaying Liu,et al.  Revisiting Batch Normalization For Practical Domain Adaptation , 2016, ICLR.

[77]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[78]  Aleksander Madry,et al.  BREEDS: Benchmarks for Subpopulation Shift , 2020, ICLR.

[79]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[80]  Xiaohua Zhai,et al.  Are we done with ImageNet? , 2020, ArXiv.

[81]  Jacob Steinhardt,et al.  Limitations of Post-Hoc Feature Alignment for Robustness , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[83]  Alec Radford,et al.  Multimodal Neurons in Artificial Neural Networks , 2021 .

[84]  Aleksander Madry,et al.  Leveraging Sparse Linear Layers for Debuggable Deep Networks , 2021, ICML.

[85]  J. Dunning The elephant in the room. , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[86]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.