Improved Group Robustness via Classifier Retraining on Independent Splits

Deep neural networks learned by minimizing the average risk can achieve strong average performance, but their performance for a subgroup may degrade, if the subgroup is underrepresented in the overall data population. Group distributionally robust optimization [Sagawa et al., 2020a, GDRO] is a standard baseline for learning models with strong worst-group performance. However, GDRO requires group labels for every example during training and can be prone to overfitting, often requiring careful model capacity control via regularization or early stopping. When only a limited amount of group labels is available, Just Train Twice [Liu et al., 2021, JTT] is a popular approach which infers a pseudo-group-label for every unlabeled example. The process of inferring pseudo labels can be highly sensitive during model selection. To alleviate overfitting for GDRO and the pseudo labeling process for JTT, we propose a new method via classifier retraining on independent splits (of the training data). We find that using a novel sample splitting procedure achieves robust worst-group performance in the fine-tuning step. When evaluated on benchmark image and text classification tasks, our approach consistently reduces the requirement of group labels and hyperparameter search during training. Experimental results confirm that our approach performs favorably compared with existing methods (including GDRO and JTT) when either group labels are available during training or are only available during validation.

[1]  Jinwoo Shin,et al.  Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation , 2022, ICLR.

[2]  James Y. Zou,et al.  Improving Out-of-Distribution Robustness via Selective Augmentation , 2022, ICML.

[3]  Pradeep Ravikumar,et al.  An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization , 2021, AISTATS.

[4]  Daniel N. Barry,et al.  A Too-Good-to-be-True Prior to Reduce Shortcut Reliance , 2021, Pattern Recognit. Lett..

[5]  Shuicheng Yan,et al.  Deep Long-Tailed Learning: A Survey , 2021, ArXiv.

[6]  Chelsea Finn,et al.  Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.

[7]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[8]  R. Zemel,et al.  Environment Inference for Invariant Learning , 2020, ICML.

[9]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[10]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[11]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[12]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..

[13]  Ankit Singh Rawat,et al.  Overparameterisation and worst-case generalisation: friend or foe? , 2021, ICLR.

[14]  Christopher Ré,et al.  No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems , 2020, NeurIPS.

[15]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[16]  Jinwoo Shin,et al.  Learning from Failure: Training Debiased Classifier from Biased Classifier , 2020, ArXiv.

[17]  Sergey Levine,et al.  Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.

[18]  Ed H. Chi,et al.  Fairness without Demographics through Adversarially Reweighted Learning , 2020, NeurIPS.

[19]  Pang Wei Koh,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[20]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[21]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[22]  Saining Xie,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2019, ICLR.

[23]  Jared A. Dunnmon,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[24]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[25]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[26]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[27]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[28]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[29]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[31]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[32]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[36]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .