Stable Prediction with Model Misspecification and Agnostic Distribution Shift

For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

[1]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[2]  Bo Li,et al.  Stable Prediction across Unknown Environments , 2018, KDD.

[3]  Bo Li,et al.  Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing , 2017, KDD.

[4]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[5]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[6]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[7]  Bo Li,et al.  Treatment Effect Estimation with Data-Driven Variable Decomposition , 2017, AAAI.

[8]  Zoltán Sasvári,et al.  When does E(Xk·Yl)=E(Xk)·E(Yl) imply independence? , 2006 .

[9]  Tianbao Yang,et al.  A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization , 2018, Big Data Cogn. Comput..

[10]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Chad Hazlett,et al.  Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements , 2018 .

[13]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[14]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[15]  Taiji Suzuki,et al.  Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables , 2017, AISTATS.

[16]  Jian He,et al.  Decadal application of WRF/Chem for regional air quality and climate modeling over the U.S. under the representative concentration pathways scenarios. Part 1: Model evaluation and impact of downscaling , 2017 .

[17]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Bo Li,et al.  Causally Regularized Learning with Agnostic Data Selection Bias , 2017, ACM Multimedia.