Improved Worst-Group Robustness via Classifier Retraining on Independent Splits

High-capacity deep neural networks (DNNs) trained with Empirical Risk Minimization (ERM) often suffer from poor worst-group accuracy despite good on-average performance, where worst-group accuracy measures a model’s robustness towards certain subpopulations of the input space. Spurious correlations and memorization behaviors of ERM trained DNNs are typically attributed to this degradation in performance. We develop a method, called CRIS, that address these issues by performing robust classifier retraining on independent splits of the dataset. This results in a simple method that improves upon state-of-the-art methods, such as Group DRO, on standard datasets while relying on much fewer group labels and little additional hyperparameter tuning. validation accuracies on the feature extractor’s quality (measured by robust performance after classifier retraining) is presented in Appendix B.2. The data show that there is a positive correlation between validation average accuracy and the model’s features’ quality. This observation serves as a proxy for CRIS’s model selection criterion. A similar positive correlation is observed when validation worst-group accuracy is used.

[1]  Jinwoo Shin,et al.  Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation , 2022, ICLR.

[2]  James Y. Zou,et al.  Improving Out-of-Distribution Robustness via Selective Augmentation , 2022, ICML.

[3]  Pradeep Ravikumar,et al.  An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization , 2021, AISTATS.

[4]  Daniel N. Barry,et al.  A Too-Good-to-be-True Prior to Reduce Shortcut Reliance , 2021, Pattern Recognit. Lett..

[5]  Shuicheng Yan,et al.  Deep Long-Tailed Learning: A Survey , 2021, ArXiv.

[6]  Chelsea Finn,et al.  Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.

[7]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[8]  Aaron C. Courville,et al.  Gradient Starvation: A Learning Proclivity in Neural Networks , 2020, NeurIPS.

[9]  R. Zemel,et al.  Environment Inference for Invariant Learning , 2020, ICML.

[10]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[11]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[12]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[13]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[14]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..

[15]  Ankit Singh Rawat,et al.  Overparameterisation and worst-case generalisation: friend or foe? , 2021, ICLR.

[16]  Christopher Ré,et al.  No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems , 2020, NeurIPS.

[17]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[18]  Jinwoo Shin,et al.  Learning from Failure: Training Debiased Classifier from Biased Classifier , 2020, ArXiv.

[19]  Sergey Levine,et al.  Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.

[20]  Pang Wei Koh,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[21]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[22]  Saining Xie,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2019, ICLR.

[23]  Jared A. Dunnmon,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[24]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[27]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[28]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[29]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[33]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[34]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[35]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[36]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[37]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[41]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .