Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions

A popular approach for estimating an unknown signal <inline-formula> <tex-math notation="LaTeX">$ \mathbf {x}_{0}\in \mathbb {R} ^{n}$ </tex-math></inline-formula> from noisy, linear measurements <inline-formula> <tex-math notation="LaTeX">$ \mathbf {y}= \mathbf {A} \mathbf {x} _{0}+ \mathbf {z}\in \mathbb {R}^{m}$ </tex-math></inline-formula> is via solving a so called <italic>regularized</italic> <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>-estimator: <inline-formula> <tex-math notation="LaTeX">$\hat{\mathbf {x}} :=\arg \min _ \mathbf {x} \mathcal {L} (\mathbf {y}- \mathbf {A} \mathbf {x})+\lambda f(\mathbf {x})$ </tex-math></inline-formula>. Here, <inline-formula> <tex-math notation="LaTeX">$ \mathcal {L}$ </tex-math></inline-formula> is a convex loss function, <inline-formula> <tex-math notation="LaTeX">$f$ </tex-math></inline-formula> is a convex (typically, non-smooth) regularizer, and <inline-formula> <tex-math notation="LaTeX">$\lambda > 0$ </tex-math></inline-formula> is a regularizer parameter. We analyze the squared error performance <inline-formula> <tex-math notation="LaTeX">$\|\hat{\mathbf {x}} - \mathbf {x}_{0}\|_{2}^{2}$ </tex-math></inline-formula> of such estimators in the <italic>high-dimensional proportional regime</italic> where <inline-formula> <tex-math notation="LaTeX">$m,n\rightarrow \infty $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$m/n\rightarrow \delta $ </tex-math></inline-formula>. The design matrix <inline-formula> <tex-math notation="LaTeX">$ \mathbf {A}$ </tex-math></inline-formula> is assumed to have entries iid Gaussian; only minimal and rather mild regularity conditions are imposed on the loss function, the regularizer, and on the noise and signal distributions. We show that the squared error converges in probability to a nontrivial limit that is given as the solution to a minimax convex-concave optimization problem on four scalar optimization variables. We identify a new summary parameter, termed the <italic>expected Moreau envelope</italic> to play a central role in the error characterization. The <italic>precise</italic> nature of the results permits an accurate performance comparison between different instances of regularized <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>-estimators and allows to optimally tune the involved parameters (such as the regularizer parameter and the number of measurements). The key ingredient of our proof is the <italic>convex Gaussian min-max theorem</italic> which is a tight and strengthened version of a classical Gaussian comparison inequality that was proved by Gordon in 1988.

[1]  Weiyu Xu,et al.  Precise Stability Phase Transitions for $\ell_1$ Minimization: A Unified Geometric Framework , 2011, IEEE Transactions on Information Theory.

[2]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[3]  R. Prentice,et al.  Commentary on Andersen and Gill's "Cox's Regression Model for Counting Processes: A Large Sample Study" , 1982 .

[4]  D. Balding,et al.  Structured Regularizers for High-Dimensional Problems : Statistical and Computational Issues , 2014 .

[5]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[6]  Christos Thrampoulidis,et al.  The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[8]  Babak Hassibi,et al.  New Null Space Results and Recovery Thresholds for Matrix Rank Minimization , 2010, ArXiv.

[9]  Sergio Verdú,et al.  Optimal Phase Transitions in Compressed Sensing , 2011, IEEE Transactions on Information Theory.

[10]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[11]  Richard G. Baraniuk,et al.  Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP) , 2011, IEEE Transactions on Information Theory.

[12]  Joel A. Tropp,et al.  Living on the edge: A geometric theory of phase transitions in convex optimization , 2013, ArXiv.

[13]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[14]  Pradeep Ravikumar,et al.  Beyond Sub-Gaussian Measurements: High-Dimensional Structured Estimation with Sub-Exponential Designs , 2015, NIPS.

[15]  Andrea Montanari,et al.  Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising , 2011, IEEE Transactions on Information Theory.

[16]  Mikko Vehkaperä,et al.  Analysis of Regularized LS Reconstruction and Random Matrix Ensembles in Compressed Sensing , 2013, IEEE Transactions on Information Theory.

[17]  Christos Thrampoulidis,et al.  Precise error analysis of the LASSO , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Yoshiyuki Kabashima,et al.  Statistical mechanical analysis of a typical reconstruction limit of compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[19]  R. Durrett Probability: Theory and Examples , 1993 .

[20]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[21]  Christos Thrampoulidis,et al.  Ber analysis of the box relaxation for BPSK signal recovery , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Rina Foygel,et al.  Corrupted Sensing: Novel Guarantees for Separating Structured Signals , 2013, IEEE Transactions on Information Theory.

[24]  Yaniv Plan,et al.  The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.

[25]  Christos Thrampoulidis,et al.  Isotropically random orthogonal matrices: Performance of LASSO and minimum conic singular values , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[26]  Christos Thrampoulidis,et al.  LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.

[27]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[28]  Vidyashankar Sivakumar,et al.  Estimation with Norm Regularization , 2014, NIPS.

[29]  M. Rudelson,et al.  Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[30]  Volkan Cevher,et al.  Convexity in Source Separation : Models, geometry, and algorithms , 2013, IEEE Signal Processing Magazine.

[31]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[32]  Mokshay M. Madiman,et al.  Generalized Entropy Power Inequalities and Monotonicity Properties of Information , 2006, IEEE Transactions on Information Theory.

[33]  Dongning Guo,et al.  A single-letter characterization of optimal noisy compressed sensing , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[34]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[35]  M. Stojnic Upper-bounding $\ell_1$-optimization weak thresholds , 2013 .

[36]  Joel A. Tropp,et al.  Universality laws for randomized dimension reduction, with applications , 2015, ArXiv.

[37]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[38]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.

[39]  Joel A. Tropp,et al.  Convex recovery of a structured signal from independent random linear measurements , 2014, ArXiv.

[40]  Jelena Bradic,et al.  Robustness in sparse linear models: relative efficiency based on robust approximate message passing , 2015, ArXiv.

[41]  Noureddine El Karoui,et al.  Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[42]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[43]  Christos Thrampoulidis,et al.  Regularized Linear Regression: A Precise Analysis of the Estimation Error , 2015, COLT.

[44]  Mihailo Stojnic,et al.  Block-length dependent thresholds in block-sparse compressed sensing , 2009, ArXiv.

[45]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[46]  T. Blumensath,et al.  Theory and Applications , 2011 .

[47]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[48]  Andrea Montanari,et al.  Applications of the Lindeberg Principle in Communications and Statistical Learning , 2010, IEEE Transactions on Information Theory.

[49]  Andrea Montanari,et al.  Universality in Polytope Phase Transitions and Message Passing Algorithms , 2012, ArXiv.

[50]  Christos Thrampoulidis,et al.  Symbol Error Rate Performance of Box-Relaxation Decoders in Massive MIMO , 2018, IEEE Transactions on Signal Processing.

[51]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[52]  Mihailo Stojnic,et al.  Various thresholds for ℓ1-optimization in compressed sensing , 2009, ArXiv.

[53]  Mathukumalli Vidyasagar,et al.  An Introduction to Compressed Sensing , 2019 .

[54]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[55]  Christos Thrampoulidis,et al.  Simple error bounds for regularized noisy linear inverse problems , 2014, 2014 IEEE International Symposium on Information Theory.

[56]  Mihailo Stojnic,et al.  Meshes that trap random subspaces , 2013, ArXiv.

[57]  Christos Thrampoulidis,et al.  Asymptotically exact error analysis for the generalized equation-LASSO , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[58]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[59]  R. Vershynin Estimation in High Dimensions: A Geometric Perspective , 2014, 1405.5103.

[60]  Christos Thrampoulidis,et al.  Estimating structured signals in sparse noise: A precise noise sensitivity analysis , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[61]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[62]  Joel A. Tropp,et al.  Sharp Recovery Bounds for Convex Demixing, with Applications , 2012, Found. Comput. Math..

[63]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[64]  Mihailo Stojnic Upper-bounding ℓ1-optimization weak thresholds , 2013, ArXiv.

[65]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[66]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[67]  Andrea Montanari,et al.  Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques , 2014, ArXiv.

[68]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[69]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[70]  Armeen Taeb,et al.  Maximin Analysis of Message Passing Algorithms for Recovering Block Sparse Signals , 2013, ArXiv.

[71]  Joel A. Tropp,et al.  Living on the edge: phase transitions in convex programs with random data , 2013, 1303.6672.

[72]  M. Stojnic Various thresholds for $\ell_1$-optimization in compressed sensing , 2009 .

[73]  Noureddine El Karoui On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[74]  Sundeep Rangan,et al.  Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing , 2009, NIPS.

[75]  Sergio Verdú,et al.  Optimal phase transitions in compressed sensing with noisy measurements , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[76]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[77]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[78]  S. Mendelson,et al.  Compressed sensing under weak moment assumptions , 2014, 1401.2188.

[79]  B. Hassibi,et al.  Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing , 2015 .

[80]  Weiyu Xu,et al.  Null space conditions and thresholds for rank minimization , 2011, Math. Program..

[81]  Klaus J. Miescke,et al.  Statistical decision theory : estimation, testing, and selection , 2008 .

[82]  David L. Donoho,et al.  High-Dimensional Centrally Symmetric Polytopes with Neighborliness Proportional to Dimension , 2006, Discret. Comput. Geom..

[83]  D. Donoho,et al.  Variance Breakdown of Huber ( M )-estimators : n / p → m ∈ ( 1 , ∞ ) , 2015 .

[84]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[85]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[86]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[87]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[88]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[89]  D. Donoho,et al.  Variance Breakdown of Huber (M)-estimators: $n/p \rightarrow m \in (1,\infty)$ , 2015, 1503.02106.

[90]  Mihailo Stojnic,et al.  A framework to characterize performance of LASSO algorithms , 2013, ArXiv.

[91]  M. Sion On general minimax theorems , 1958 .

[92]  Ya-Ping Hsieh,et al.  A Geometric View on Constrained M-Estimators , 2015, 1506.08163.