Towards Robust Classification Model by Counterfactual and Invariant Data Generation

Despite the success of machine learning applications in science, industry, and society in general, many approaches are known to be non-robust, often relying on spurious correlations to make predictions. Spuriousness occurs when some features correlate with labels but are not causal; relying on such features prevents models from generalizing to unseen environments where such correlations break. In this work, we focus on image classification and propose two data generation processes to reduce spuriousness. Given human annotations of the subset of the features responsible (causal) for the labels (e.g. bounding boxes), we modify this causal set to generate a surrogate image that no longer has the same label (i.e. a counterfactual image). We also alter non-causal features to generate images still recognized as the original labels, which helps to learn a model invariant to these features. In several challenging datasets, our data generations outperform state-of-the-art methods in accuracy when spurious correlations break, and increase the saliency focus on causal features providing better explanations.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Chandan Singh,et al.  Interpretations are useful: penalizing explanations to align neural networks with prior knowledge , 2019, ICML.

[5]  Sally Shrapnel,et al.  Deep neural network or dermatologist? , 2019, iMIMIC/ML-CDS@MICCAI.

[6]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[7]  Pascal Sturmfels,et al.  Learning Explainable Models Using Attribution Priors , 2019, ArXiv.

[8]  Joseph Paul Cohen,et al.  Underwhelming Generalization Improvements From Controlling Feature Attribution , 2019, ArXiv.

[9]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[10]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[11]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[12]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[13]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[14]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[15]  Toniann Pitassi,et al.  Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[16]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[17]  Loris Nanni,et al.  A critic evaluation of methods for COVID-19 automatic detection from X-ray images , 2020, Information Fusion.

[18]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Eduardo Valle,et al.  (De) Constructing Bias on Skin Lesion Datasets , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Yoshua Bengio,et al.  GradMask: Reduce Overfitting by Regularizing Saliency , 2019, ArXiv.

[22]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[23]  Ryan Cotterell,et al.  Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[24]  Jianguo Zhang,et al.  CARE: Class Attention to Regions of Lesion for Classification on Imbalanced Data , 2019, MIDL.

[25]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[26]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[27]  Shiliang Pu,et al.  Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiaoli Z. Fern,et al.  Saliency Learning: Teaching the Model Where to Pay Attention , 2019, NAACL.

[29]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[30]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[32]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[33]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[34]  Xia Hu,et al.  Learning Credible Deep Neural Networks with Rationale Regularization , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[35]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[36]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[37]  Hironobu Fujiyoshi,et al.  Embedding Human Knowledge in Deep Neural Network via Attention Map , 2019, VISIGRAPP.

[38]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[39]  Marcus A. Badgeley,et al.  Confounding variables can degrade generalization performance of radiological deep learning models , 2018, ArXiv.

[40]  Mario Fritz,et al.  Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Regina Barzilay,et al.  Deriving Machine Attention from Human Rationales , 2018, EMNLP.