Stable Learning via Causality-based Feature Rectification

How to learn a stable model under agnostic distribution shift between training and testing datasets is an essential problem in machine learning tasks. The agnostic distribution shift caused by data generation bias can lead to model misspecification and unstable performance across different test datasets. Most of the recently proposed methods are causality-based sample reweighting methods, whose performance is affected by sample size. Moreover, these works are restricted to linear models, not to deep-learning based nonlinear models. In this work, we propose a novel Causality-based Feature Rectification (CFR) method to address the model misspecification problem under agnostic distribution shift by using a weight matrix to rectify features. Our proposal based on the fact that the causality between stable features and the ground truth is consistent under agnostic distribution shift, but is partly omitted and statistically correlated with other features. We propose the feature rectification weight matrix to reconstruct the omitted causality by using other features as proxy variables. We further propose an algorithm that jointly optimizes the weight matrix and the regressor (or classifier). Our proposal can not only improve the stability of linear models, but also deep-learning based models. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposal outperforms previous state-of-the-art stable learning methods. The code will be released later on.

[1]  Yu Gong,et al.  A Minimax Game for Instance based Selective Transfer Learning , 2019, KDD.

[2]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[3]  Yueting Zhuang,et al.  MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models , 2019, NeurIPS.

[4]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[5]  Brian D. Ziebart,et al.  Robust Classification Under Sample Selection Bias , 2014, NIPS.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[8]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Deng Cai,et al.  Adversarial-Learned Loss for Domain Adaptation , 2020, AAAI.

[10]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[11]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[12]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.

[13]  E. Badley,et al.  Exploring the role of contextual factors in disability models , 2006, Disability and rehabilitation.

[14]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[18]  Kun Kuang,et al.  Stable Prediction with Model Misspecification and Agnostic Distribution Shift , 2020, AAAI.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bo Li,et al.  Causally Regularized Learning with Agnostic Data Selection Bias , 2017, ACM Multimedia.

[21]  Kun Kuang,et al.  Stable Learning via Sample Reweighting , 2019, AAAI.