Second-order Stein: SURE for SURE and other applications in high-dimensional inference

Stein's formula states that a random variable of the form $z^\top f(z) - {\rm{div}} f(z)$ is mean-zero for all functions $f$ with integrable gradient. Here, ${\rm{div}} f$ is the divergence of the function $f$ and $z$ is a standard normal vector. A Second Order Stein formula is proposed to characterize the variance of such random variables. In the Gaussian sequence model, a remarkable consequence of Stein's formula is Stein's Unbiased Risk Estimate (SURE) of the mean square risk of almost any given estimator $\hat\mu$ for the unknown mean vector. A first application of the Second Order Stein formula is an Unbiased Risk Estimate of the risk of SURE itself (SURE for SURE): a simple unbiased estimate provides information about the squared distance between SURE and the squared estimation error of $\hat\mu$. SURE for SURE has a simple form and can be computed explicitly for differentiable $\hat\mu$, for example the Lasso and the Elastic Net. Other applications of the Second Order Stein formula are provided in high-dimensional regression. This includes novel bounds on the variance of the size of the model selected by the Lasso, and a general semi-parametric scheme to de-bias an almost differentiable initial estimator in order to estimate a low-dimensional projection of the unknown regression coefficient vector.

[1]  Optimal bounds for aggregation of affine estimators , 2014, 1410.0346.

[2]  Jalal M. Fadili,et al.  The degrees of freedom of the Lasso for general design matrix , 2011, 1111.1162.

[3]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[4]  Louis H. Y. Chen,et al.  Normal Approximation by Stein's Method , 2010 .

[5]  Pierre C. Bellec,et al.  Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty , 2016, 1609.06675.

[6]  Yin Xia,et al.  Statistical inference for high-dimensional data , 2009 .

[7]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[8]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[9]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[10]  V. Bogachev Gaussian Measures on a , 2022 .

[11]  M. Talagrand Mean Field Models for Spin Glasses , 2011 .

[12]  P. K. Jain CONVEX FUNCTIONS AND THEIR APPLICATIONS , 1968 .

[13]  Tong Zhang,et al.  Deviation Optimal Learning using Greedy Q-aggregation , 2012, ArXiv.

[14]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[15]  Dana Yang,et al.  The Cost-free Nature of Optimally Tuning Tikhonov Regularizers and Other Ordered Smoothers , 2019, ICML.

[16]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[17]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[18]  S. Mendelson,et al.  Regularization and the small-ball method I: sparse recovery , 2016, 1601.05584.

[19]  Sourav Chatterjee,et al.  A short survey of Stein's method , 2014, 1404.1392.

[20]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[21]  Murat A. Erdogdu Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma , 2015, J. Mach. Learn. Res..

[22]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[23]  Ker-Chau Li,et al.  Honest Confidence Regions for Nonparametric Regression , 1989 .

[24]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[25]  Pierre C. Bellec,et al.  De-biasing the lasso with degrees-of-freedom adjustment , 2019, Bernoulli.

[26]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[27]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[28]  P. Bellec The noise barrier and the large signal bias of the Lasso and other convex estimators , 2018, 1804.01230.

[29]  Glenn Stone Statistics for High‐Dimensional Data: Methods, Theory and Applications. By Peter Buhlmann and Sara van de Geer. Springer, Berlin, Heidelberg. 2011. xvii+556 pages. €104.99 (hardback). ISBN 978‐3‐642‐20191‐2. , 2013 .

[30]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[31]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[32]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[33]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[34]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[35]  A. Tsybakov,et al.  Sparse Estimation by Exponential Weighting , 2011, 1108.5116.

[36]  J. Vondrák,et al.  The Probabilistic Method Lecture Notes , 2009 .

[37]  Emmanuel J. Candès,et al.  Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators , 2012, IEEE Transactions on Signal Processing.

[38]  A. Kneip Ordered Linear Smoothers , 1994 .

[39]  Tong Zhang,et al.  Aggregation of Affine Estimators , 2013, ArXiv.

[40]  Sylvie Huet,et al.  High-dimensional regression with unknown variance , 2011, 1109.5587.

[41]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[42]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[43]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[44]  E. W. Morris No , 1923, The Hospital and health review.

[45]  Zhaoran Wang,et al.  Learning non-Gaussian multi-index model via second-order Stein's method , 2017, NIPS 2017.

[46]  Pierre C. Bellec,et al.  Second order Poincaré inequalities and de-biasing arbitrary convex regularizers when $p/n \to γ$ , 2019 .

[47]  Victor Chernozhukov,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2014 .

[48]  Anima Anandkumar,et al.  Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.

[49]  MontanariAndrea,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2014 .

[50]  Saharon Rosset,et al.  Excess Optimism: How Biased is the Apparent Error of an Estimator Tuned by SURE? , 2016, Journal of the American Statistical Association.

[51]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[52]  C. Mallows More comments on C p , 1995 .

[53]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[54]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[55]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[56]  Constantin P. Niculescu,et al.  Convex Functions and Their Applications: A Contemporary Approach , 2005 .

[57]  M. Ledoux,et al.  Analysis and Geometry of Markov Diffusion Operators , 2013 .

[58]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[59]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[60]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[61]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[62]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[63]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[64]  On Inadmissibility of Some Unbiased Estimates of Loss , 1988 .

[65]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[66]  Sara van de Geer,et al.  Confidence sets in sparse regression , 2012, 1209.1508.