Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification

We address the overlooked unbiasedness in existing longtailed classification methods: we find that their overall improvement is mostly attributed to the biased preference of “tail” over “head”, as the test distribution is assumed to be balanced; however, when the test is as imbalanced as the long-tailed training data—let the test respect Zipf’s law of nature—the “tail” bias is no longer beneficial overall because it hurts the “head” majorities. In this paper, we propose Cross-Domain Empirical Risk Minimization (xERM) for training an unbiased model to achieve strong performances on both test distributions, which empirically demonstrates that xERM fundamentally improves the classification by learning better feature representation rather than the “head vs. tail” game. Based on causality, we further theoretically explain why xERM achieves unbiasedness: the bias caused by the domain selection is removed by adjusting the empirical risks on the imbalanced domain and the balanced but unseen domain. Codes are available at https://github.com/BeierZhu/xERM.

[1]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiu-Shen Wei,et al.  Distilling Virtual Examples for Long-tailed Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Cordelia Schmid,et al.  Class-Balanced Distillation for Long-Tailed Visual Recognition , 2021, BMVC.

[4]  Chuchu Han,et al.  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[7]  Zhiwu Lu,et al.  Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, Computer Vision and Pattern Recognition.

[8]  Elias Bareinboim,et al.  Transportability across studies: A formal approach , 2011 .

[9]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Xian-Sheng Hua,et al.  Counterfactual Zero-Shot and Open-Set Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[13]  Ioannis A. Kakadiaris,et al.  Deep Imbalanced Attribute Classification using Visual Attention Aggregation , 2018, ECCV.

[14]  J. Heckman Sample selection bias as a specification error , 1979 .

[15]  L. Keele The Statistics of Causal Inference: A View from Political Methodology , 2015, Political Analysis.

[16]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Guiguang Ding,et al.  Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification , 2020, ECCV.

[19]  Jin Tian,et al.  Learning Causal Effects via Weighted Empirical Risk Minimization , 2020, NeurIPS.

[20]  Seungju Han,et al.  Disentangling Label Distribution for Long-tailed Visual Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jinwoo Shin,et al.  Learning from Failure: Training Debiased Classifier from Biased Classifier , 2020, ArXiv.

[22]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[23]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[24]  Songyang Zhang,et al.  Distribution Alignment: A Unified Framework for Long-tail Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  J. Robins Data, Design, and Background Knowledge in Etiologic Inference , 2001, Epidemiology.

[26]  Junjie Yan,et al.  Equalization Loss for Long-Tailed Object Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[28]  Silong Peng,et al.  Balanced Knowledge Distillation for Long-tailed Learning , 2021, Neurocomputing.

[29]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[33]  Kaisheng Ma,et al.  Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Zhongqi Miao,et al.  Long-tailed Recognition by Routing Diverse Distribution-Aware Experts , 2021, ICLR.

[35]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[36]  Ankit Singh Rawat,et al.  Long-tail learning via logit adjustment , 2020, ICLR.

[37]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Hanwang Zhang,et al.  Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.