Distribution-Free Predictive Inference for Regression

ABSTRACT We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, to adapt to heteroscedasticity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this article is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.

[1]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[2]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[3]  Jing Lei Classification with confidence , 2014 .

[4]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[5]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[8]  Jonathan Taylor,et al.  Asymptotics of Selective Inference , 2015, 1501.03588.

[9]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[10]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[11]  Ronald W. Butler,et al.  Predictive Intervals Based on Reuse of the Sample , 1980 .

[12]  Mohamed Hebiri,et al.  Sparse conformal predictors , 2009, Stat. Comput..

[13]  Elizaveta Levina,et al.  Discussion of "Stability selection" by N. Meinshausen and P. Buhlmann , 2010 .

[14]  Vladimir Vovk,et al.  Efficiency of conformalized ridge regression , 2014, COLT.

[15]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[16]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[19]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[20]  H. Zou,et al.  Regression Shrinkage and Selection via the Elastic Net , with Applications to Microarrays , 2003 .

[21]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[22]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[23]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[24]  MontanariAndrea,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2014 .

[25]  Hannes Leeb,et al.  Leave-one-out prediction intervals in linear regression models with many variables , 2016, 1602.05801.

[26]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[27]  Jonathan E. Taylor,et al.  Selective inference with a randomized response , 2015, 1507.06739.

[28]  P. Bühlmann Statistical significance in high-dimensional linear models , 2013 .

[29]  Venkat Reddy Konasani,et al.  Multiple Regression Analysis , 2015 .

[30]  Larry Wasserman,et al.  Distribution‐free prediction bands for non‐parametric regression , 2014 .

[31]  Andreas Buja,et al.  Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression , 2014 .

[32]  A. Gammerman,et al.  On-line predictive linear regression , 2005, math/0511522.

[33]  A. Buja,et al.  Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression , 2014, 1404.1578.

[34]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[35]  T. Tony Cai,et al.  Discussion: "A significance test for the lasso" , 2014, 1405.6793.

[36]  Vladimir Vovk,et al.  Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[37]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[40]  J. Robins,et al.  Distribution-Free Prediction Sets , 2013, Journal of the American Statistical Association.

[41]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.