BARACK: Partially Supervised Group Robustness With Guarantees

While neural networks have shown remarkable success on classification tasks in terms of average-case performance, they often fail to perform well on certain groups of the data. Such group information may be expensive to obtain; thus, recent works in robustness and fairness have proposed ways to improve worst-group performance even when group labels are unavailable for the training data. However, these methods generally underperform methods that utilize group information at training time. In this work, we assume access to a small number of group labels alongside a larger dataset without group labels. We propose BARACK, a simple two-step framework to utilize this partial group information to improve worst-group performance: train a model to predict the missing group labels for the training data, and then use these predicted group labels in a robust optimization objective. Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, which scale with respect to both the total number of training points and the number of training points with group labels. Empirically, our method outperforms the baselines that do not use group information, even when only 1-33% of points have group labels. We provide ablation studies to support the robustness and extensibility of our framework.

[1]  Jinwoo Shin,et al.  Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation , 2022, ICLR.

[2]  Michael Zhang,et al.  Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations , 2022, ICML.

[3]  Madeleine Udell,et al.  Towards Group Robustness in the presence of Partial Group Labels , 2022, ArXiv.

[4]  Andrew C. Miller,et al.  Learning Invariant Representations with Missing Data , 2021, CLeaR.

[5]  Chelsea Finn,et al.  Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.

[6]  Guillermo Sapiro,et al.  Blind Pareto Fairness and Subgroup Robustness , 2021, ICML.

[7]  Christopher Ré,et al.  No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems , 2020, NeurIPS.

[8]  Suvrit Sra,et al.  Coping with Label Shift via Distributionally Robust Optimisation , 2020, ICLR.

[9]  R. Zemel,et al.  Environment Inference for Invariant Learning , 2020, ICML.

[10]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[11]  Iryna Gurevych,et al.  Towards Debiasing NLU Models from Unknown Biases , 2020, EMNLP.

[12]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[13]  Virginia Smith,et al.  Tilted Empirical Risk Minimization , 2020, ICLR.

[14]  Guillermo Sapiro,et al.  Minimax Pareto Fairness: A Multi Objective Perspective , 2020, ICML.

[15]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[16]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[17]  Ed H. Chi,et al.  Fairness without Demographics through Adversarially Reweighted Learning , 2020, NeurIPS.

[18]  Ankit Singh Rawat,et al.  Can gradient clipping mitigate label noise? , 2020, ICLR.

[19]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[20]  Michael I. Jordan,et al.  Robust Optimization for Fairness with Noisy Protected Groups , 2020, NeurIPS.

[21]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[22]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[23]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[24]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ananth Balashankar,et al.  What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers , 2019, ArXiv.

[26]  Jared A. Dunnmon,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[27]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[28]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[29]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[30]  Hanna M. Wallach,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[31]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[32]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[33]  Jonathan Krause,et al.  Scalable Annotation of Fine-Grained Categories Without Experts , 2017, CHI.

[34]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[40]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[41]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[42]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[43]  Timothy J. Hazen,et al.  Increasing Robustness to Spurious Correlations using Forgettable Examples , 2021, EACL.

[44]  Aaron C. Courville,et al.  Systematic generalisation with group invariant predictions , 2021, ICLR.

[45]  Jinwoo Shin,et al.  Learning from Failure: De-biasing Classifier from Biased Classifier , 2020, NeurIPS.

[46]  John C. Duchi,et al.  Distributionally Robust Losses Against Mixture Covariate Shifts , 2019 .

[47]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[48]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[49]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[50]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[51]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .