Sparse conformal predictors

Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the ℓ1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real data.

[1]  L. Györfi,et al.  A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[2]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[3]  A. Gammerman,et al.  On-line predictive linear regression , 2005, math/0511522.

[4]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[5]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[6]  Mohamed Hebiri,et al.  Regularization with the Smooth-Lasso procedure , 2008, 0803.0668.

[7]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[8]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[9]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[10]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[11]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[12]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[13]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[14]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  F. Santosa,et al.  Linear inversion of ban limit reflection seismograms , 1986 .

[17]  Cong Huang Risk of penalized least squares, greedy selection andl 1-penalization for flexible function libraries , 2008 .

[18]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[19]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[20]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[21]  Vladimir Vovk,et al.  Asymptotic Optimality of Transductive Confidence Machine , 2002, ALT.

[22]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[25]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[30]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[31]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[32]  Laurent El Ghaoui,et al.  An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.

[33]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[34]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[35]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[36]  Weifeng Liu,et al.  Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .

[37]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[38]  C. Chesneau,et al.  Some theoretical results on the Grouped Variables Lasso , 2008 .

[39]  Vladimir Vovk,et al.  On-line confidence machines are well-calibrated , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[40]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[41]  Torsten Hothorn,et al.  Twin Boosting: improved feature selection and prediction , 2010, Stat. Comput..