Debiasing Linear Prediction

Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter. However, can knowledge of the next test point $\mathbf{x}_{\star}$ be exploited to improve prediction accuracy? We address this question in the context of linear prediction, showing how techniques from semi-parametric inference can be used transductively to combat regularization bias. We first lower bound the $\mathbf{x}_{\star}$ prediction error of ridge regression and the Lasso, showing that they must incur significant bias in certain test directions. We then provide non-asymptotic upper bounds on the $\mathbf{x}_{\star}$ prediction error of two transductive prediction rules. We conclude by showing the efficacy of our methods on both synthetic and real data, highlighting the improvements single point transductive prediction can provide in settings with distribution shift.

[1]  Ilias Zadik,et al.  Orthogonal Machine Learning: Power and Limitations , 2017, ICML.

[2]  Jason Weston,et al.  Transductive Inference for Estimating Values of Functions , 1999, NIPS.

[3]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[4]  Mehryar Mohri,et al.  Stability of transductive regression algorithms , 2008, ICML '08.

[5]  Sara van de Geer,et al.  Statistical Theory for High-Dimensional Models , 2014, 1409.8557.

[6]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[7]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[8]  T. Tony Cai,et al.  Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity , 2015, 1506.05539.

[9]  Pierre Del Moral,et al.  An Introduction to Wishart Matrix Moments , 2017, Found. Trends Mach. Learn..

[10]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[11]  Pierre Alquier,et al.  Transductive versions of the LASSO and the Dantzig Selector , 2009, Journal of Statistical Planning and Inference.

[12]  A. Dalalyan,et al.  On the prediction loss of the lasso in the partially labeled setting , 2016, 1606.06179.

[13]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[14]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[15]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[16]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[17]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Causal Parameters , 2016, 1608.00060.

[18]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[19]  Pierre C. Bellec,et al.  De-biasing the lasso with degrees-of-freedom adjustment , 2019, Bernoulli.

[20]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[21]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[22]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[23]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[24]  Yinchu Zhu,et al.  Linear Hypothesis Testing in Dense High-Dimensional Linear Models , 2016, Journal of the American Statistical Association.

[25]  Mehryar Mohri,et al.  On Transductive Regression , 2006, NIPS.

[26]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[27]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[28]  Han Liu,et al.  On High Dimensional Post-Regularization Prediction Intervals , 2016 .

[29]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[30]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .