ℓ1-regularized linear regression: persistence and oracle inequalities

We study the predictive performance of ℓ1-regularized linear regression in a model-free setting, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that avoids the boundedness problems that typically arise in model-free empirical minimization. Our technique provides an answer to a conjecture of Greenshtein and Ritov (Bernoulli 10(6):971–988, 2004) regarding the “persistence” rate for linear regression and allows us to prove an oracle inequality for the error of the regularized minimizer. It also demonstrates that empirical risk minimization gives optimal rates (up to log factors) of convex aggregation of a set of estimators of a regression function.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[3]  G. Pisier Some applications of the metric entropy condition to harmonic analysis , 1983 .

[4]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[5]  B. Carl Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces , 1985 .

[6]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[7]  G. Pisier ASYMPTOTIC THEORY OF FINITE DIMENSIONAL NORMED SPACES (Lecture Notes in Mathematics 1200) , 1987 .

[8]  M. Talagrand Regularity of gaussian processes , 1987 .

[9]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[10]  B. Bollobás THE VOLUME OF CONVEX BODIES AND BANACH SPACE GEOMETRY (Cambridge Tracts in Mathematics 94) , 1991 .

[11]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[12]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[13]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[14]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[17]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[18]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[19]  Shahar Mendelson,et al.  On the Performance of Kernel Classes , 2003, J. Mach. Learn. Res..

[20]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[21]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[22]  J. Picard,et al.  Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[23]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[24]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[25]  M. Talagrand The Generic Chaining , 2005 .

[26]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[27]  P. Bartlett,et al.  Empirical minimization , 2006 .

[28]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[29]  G. Paouris Concentration of mass on convex bodies , 2006 .

[30]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[31]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[32]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[33]  Florentina Bunea,et al.  Aggregation and sparsity via 1 penalized least squares , 2006 .

[34]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[35]  Shahar Mendelson,et al.  Gaussian averages of interpolated bodies and applications to approximate reconstruction , 2007, J. Approx. Theory.

[36]  S. Mendelson,et al.  Subspaces and Orthogonal Decompositions Generated by Bounded Orthogonal Systems , 2007 .

[37]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[38]  S. Mendelson,et al.  Majorizing measures and proportional subsets of bounded orthonormal systems , 2008, 0801.3556.

[39]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[40]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[41]  P. Bartlett FAST RATES FOR ESTIMATION ERROR AND ORACLE INEQUALITIES FOR MODEL SELECTION , 2008, Econometric Theory.

[42]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[43]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[44]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[45]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[46]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[47]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[48]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[49]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[50]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .