Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

Collider bias is a harmful form of sample selection bias that neural networks are illequipped to handle. This bias manifests itself when the underlying causal signal is strongly correlated with other confounding signals due to the training data collection procedure. In the situation where the confounding signal is easy-to-learn, deep neural networks will latch onto this and the resulting model will generalise poorly to in-the-wild test scenarios. We argue herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used – one that prefers easyto-compute signals when available. We show it is possible to mitigate against this by generating bias-decoupled training data using latent adversarial debiasing (LAD), even when the confounding signal is present in 100% of the training data. By training neural networks on these adversarial examples, we can improve their generalisation in collider bias settings. Experiments show stateof-the-art performance of LAD in label-free debiasing with gains of 76.12% on background coloured MNIST, 35.47% on foreground coloured MNIST, and 8.27% on corrupted CIFAR-10.

[1]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[2]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[3]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[4]  Konrad Zolna,et al.  Leakage-Robust Classifier via Mask-Enhanced Training (Student Abstract) , 2020, AAAI.

[5]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[6]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[7]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[8]  Matthias Bethge,et al.  Shortcut Learning in Deep Neural Networks , 2020, Nat. Mach. Intell..

[9]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[10]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Olivier Bachem,et al.  Automatic Shortcut Removal for Self-Supervised Representation Learning , 2020, ICML.

[13]  Seong Joon Oh,et al.  Learning De-biased Representations with Biased Representations , 2020, ICML.

[14]  Yejin Choi,et al.  Adversarial Filters of Dataset Biases , 2020, ICML.

[15]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  Po-Sen Huang,et al.  Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[21]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[22]  J. Heckman Sample selection bias as a specification error , 1979 .

[23]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[25]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[26]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[27]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Geometric Robustness of Deep Networks: Analysis and Improvement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[29]  Jinwoo Shin,et al.  Learning from Failure: Training Debiased Classifier from Biased Classifier , 2020, ArXiv.

[30]  Honglak Lee,et al.  SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing , 2019, ECCV.

[31]  Yejin Choi,et al.  Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.

[32]  Balaji Lakshminarayanan,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[33]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.