论文信息 - Minimum 𝓁1-norm interpolation via basis pursuit is robust to errors

Minimum 𝓁1-norm interpolation via basis pursuit is robust to errors

This article studies basis pursuit, i.e. minimum $\ell_1$-norm interpolation, in sparse linear regression with additive errors. No conditions on the errors are imposed. It is assumed that the number of i.i.d. Gaussian features grows superlinear in the number of samples. The main result is that under these conditions the Euclidean error of recovering the true regressor is of the order of the average noise level. Hence, the regressor recovered by basis pursuit is close to the truth if the average noise level is small. Lower bounds that show near optimality of the results complement the analysis. In addition, these results are extended to low rank trace regression. The proofs rely on new lower tail bounds for maxima of Gaussians vectors and the spectral norm of Gaussian matrices, respectively, and might be of independent interest as they are significantly stronger than the corresponding upper tail bounds.

[1] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[2] David Mease,et al. Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[3] Arnak S. Dalalyan,et al. Outlier-robust estimation of a sparse linear model using 𝓁1-penalized Huber's M-estimator , 2019, NeurIPS.

[4] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.

[5] Emmanuel J. Candès,et al. Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[6] Tengyuan Liang,et al. A Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm Interpolated Classifiers , 2020, SSRN Electronic Journal.

[7] David Gross,et al. Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[8] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[9] Jia Liu,et al. Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree , 2020, NeurIPS.

[10] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[11] Shahar Mendelson,et al. Learning without Concentration , 2014, COLT.

[12] Emmanuel J. Candès,et al. The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[13] Emmanuel J. Candès,et al. SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[14] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .

[15] M. Lerasle,et al. Benign overfitting in the large deviation regime , 2020, 2003.05838.

[16] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[17] E. Candès,et al. Sparsity and incoherence in compressive sampling , 2006, math/0611957.

[18] Prateek Jain,et al. Robust Regression via Hard Thresholding , 2015, NIPS.

[19] E. Candès. Mathematics of Sparsity (and a Few Other Things) , 2014 .

[20] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .

[21] S. Mendelson,et al. Regularization and the small-ball method I: sparse recovery , 2016, 1601.05584.

[22] Yaniv Plan,et al. One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[23] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[24] Stephen P. Boyd,et al. A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[25] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[26] Weijie J. Su,et al. SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[27] Pierre C Bellec,et al. Localized Gaussian width of $M$-convex hulls with applications to Lasso and convex aggregation , 2017, Bernoulli.

[28] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[29] Andrea Montanari,et al. Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[30] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31] Liu Liu,et al. High Dimensional Robust Sparse Regression , 2018, AISTATS.

[32] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[33] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[34] Yuxin Chen,et al. Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[35] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[36] Haoyang Liu,et al. Between hard and soft thresholding: optimal iterative thresholding algorithms , 2018, Information and Inference: A Journal of the IMA.

[37] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[38] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, ArXiv.

[39] Chinot Geoffrey. ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels , 2019 .

[40] A. Tsybakov,et al. Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[41] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[42] Benjamin Recht,et al. A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[43] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[44] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[45] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[46] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.

[47] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[48] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[49] Xiaojun Lin,et al. Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree , 2020, ArXiv.

[50] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[51] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[52] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[53] S. Szarek,et al. Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[54] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[55] V. Koltchinskii,et al. Bounding the smallest singular value of a random matrix without concentration , 2013, 1312.3580.

[56] Emmanuel J. Candès,et al. PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[57] M. Ndaoud. Scaled minimax optimality in high-dimensional linear regression: A non-convex algorithmic regularization approach. , 2020, 2008.12236.

[58] Joel A. Tropp,et al. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[59] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.