Targeted Data-driven Regularization for Out-of-Distribution Generalization

Due to biases introduced by large real-world datasets, deviations of deep learning models from their expected behavior on out-of-distribution test data are worrisome. Especially when data come from imbalanced or heavy-tailed label distributions, or minority groups of a sensitive feature. Classical approaches to address these biases are mostly data- or application-dependent, hence are burdensome to tune. Some meta-learning approaches, on the other hand, aim to learn hyperparameters in the learning process using different objective functions on training and validation data. However, these methods suffer from high computational complexity and are not scalable to large datasets. In this paper, we propose a unified data-driven regularization approach to learn a generalizable model from biased data. The proposed framework, named as targeted data-driven regularization (TDR), is model- and dataset-agnostic, and employs a target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner. We cast the problem as a bilevel optimization and propose an efficient stochastic gradient descent based method to solve it. The framework can be utilized to alleviate various types of biases in real-world applications. We empirically show, on both synthetic and real-world datasets, the superior performance of TDR for resolving issues stem from these biases.

[1]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[2]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[3]  Maya R. Gupta,et al.  Satisfying Real-world Goals with Dataset Constraints , 2016, NIPS.

[4]  Saeed Ghadimi,et al.  Approximation Methods for Bilevel Programming , 2018, 1802.02246.

[5]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[8]  B. G. Manjunath,et al.  Monte Carlo Computations , 2016 .

[9]  James Zijun Wang,et al.  Skeleton matching with applications in severe weather detection , 2017, Appl. Soft Comput..

[10]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[11]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[12]  Daniel Freedman,et al.  SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels , 2018, ArXiv.

[13]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[16]  Alexei A. Efros,et al.  Test-Time Training for Out-of-Distribution Generalization , 2019, ArXiv.

[17]  Paolo Favaro,et al.  Deep Bilevel Learning , 2018, ECCV.

[18]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[19]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[20]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[21]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[22]  Samy Bengio Sharing Representations for Long Tail Computer Vision Problems , 2015, ICMI.

[23]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[25]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[26]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[27]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[28]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[29]  Siddhartha Chaudhuri,et al.  Generalizing Across Domains via Cross-Gradient Training , 2018, ICLR.

[30]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[31]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[32]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[33]  Karthik Sridharan,et al.  Two-Player Games for Efficient Non-Convex Constrained Optimization , 2018, ALT.

[34]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[35]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[36]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xiaogang Wang,et al.  Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[40]  Farzin Haddadpour,et al.  Efficient Fair Principal Component Analysis , 2019, ArXiv.

[41]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[42]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[43]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Mehrdad Mahdavi,et al.  Targeted Meta-Learning for Critical Incident Detection in Weather Data , 2019 .

[45]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[46]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[47]  James Zijun Wang,et al.  2016 Ieee International Conference on Big Data (big Data) Shape Matching Using Skeleton Context for Automated Bow Echo Detection , 2022 .

[48]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.