Stable Learning via Differentiated Variable Decorrelation

Recently, as the applications of artificial intelligence gradually seeping into some risk-sensitive areas such as justice, healthcare and autonomous driving, an upsurge of research interest on model stability and robustness has arisen in the field of machine learning. Rather than purely fitting the observed training data, stable learning tries to learn a model with uniformly good performance under non-stationary and agnostic testing data. The key challenge of stable learning in practice is that we do not have any knowledge about the true model and test data distribution as a priori. Under such condition, we cannot expect a faithful estimation of model parameters and its stability over wild changing environments. Previous methods resort to a reweighting scheme to remove the correlations between all the variables through a set of new sample weights. However, we argue that such aggressive decorrelation between all the variables may cause the over-reduced sample size, which leads to the variance inflation and possible underperformance. In this paper, we incorporate the unlabled data from multiple environments into the variable decorrelation framework and propose a Differentiated Variable Decorrelation (DVD) algorithm based on the clustering of variables. Specifically, the variables are clustered according to the stability of their correlations and the variable decorrelation module learns a set of sample weights to remove the correlations merely between the variables of different clusters. Empirical studies on both synthetic and real world datasets clearly demonstrate the efficacy of our DVD algorithm on improving the model parameter estimation and the prediction stability over changing distributions.

[1]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[2]  Donald Eugene. Farrar,et al.  Multicollinearity in Regression Analysis; the Problem Revisited , 2011 .

[3]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[4]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[5]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[6]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[9]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[10]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[11]  SchefferTobias,et al.  Discriminative Learning Under Covariate Shift , 2009 .

[12]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[13]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Matjaz Kukar,et al.  Transductive reliability estimation for medical diagnosis , 2003, Artif. Intell. Medicine.

[15]  Bo Li,et al.  Stable Prediction across Unknown Environments , 2018, KDD.

[16]  Luca Martino,et al.  Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..

[17]  Chris H. Q. Ding,et al.  Uncorrelated Lasso , 2013, AAAI.

[18]  P. Bühlmann,et al.  Invariance, Causality and Robustness , 2018, Statistical Science.

[19]  Bo Li,et al.  Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing , 2017, KDD.

[20]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Taiji Suzuki,et al.  Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables , 2017, AISTATS.

[23]  Kun Kuang,et al.  Stable Learning via Sample Reweighting , 2019, AAAI.

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Kun Kuang,et al.  Stable Prediction with Model Misspecification and Agnostic Distribution Shift , 2020, AAAI.

[28]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[29]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Cynthia Rudin,et al.  Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice , 2018, Interfaces.

[31]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..