Sparse matrix inversion with scaled Lasso

We propose a new method of learning a sparse nonnegative-definite target matrix. Our primary example of the target matrix is the inverse of a population covariance or correlation matrix. The algorithm first estimates each column of the target matrix by the scaled Lasso and then adjusts the matrix estimator to be symmetric. The penalty level of the scaled Lasso for each column is completely determined by data via convex minimization, without using cross-validation. We prove that this scaled Lasso method guarantees the fastest proven rate of convergence in the spectrum norm under conditions of weaker form than those in the existing analyses of other $\ell_1$ regularized algorithms, and has faster guaranteed rate of convergence when the ratio of the $\ell_1$ and spectrum norms of the target inverse matrix diverges to infinity. A simulation study demonstrates the computational feasibility and superb performance of the proposed method. Our analysis also provides new performance bounds for the Lasso and scaled Lasso to guarantee higher concentration of the error at a smaller threshold level than previous analyses, and to allow the use of the union bound in column-by-column applications of the scaled Lasso without an adjustment of the penalty level. In addition, the least squares estimation after the scaled Lasso selection is considered and proven to guarantee performance bounds similar to that of the scaled Lasso.

[1]  Bin Yu,et al.  Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of boldmathell_1-regularized MLE , 2008, NIPS 2008.

[2]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[3]  Jianqing Fan,et al.  Comments on: ℓ1-penalization for mixture regression models , 2010 .

[4]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[5]  V. Koltchinskii The Dantzig selector and sparsity oracle inequalities , 2009, 0909.0861.

[6]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[7]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[8]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[9]  C. Borell The Brunn-Minkowski inequality in Gauss space , 1975 .

[10]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[11]  Martin J. Wainwright,et al.  Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of l1-regularized MLE , 2008, NIPS.

[12]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[13]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[14]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[15]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[16]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[17]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[18]  P. Zhao,et al.  A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) , 2008, 0807.3734.

[19]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .

[20]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[21]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[22]  Ming Yuan,et al.  High Dimensional Inverse Covariance Matrix Estimation via Linear Programming , 2010, J. Mach. Learn. Res..

[23]  P. Massart,et al.  Gaussian model selection , 2001 .

[24]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[25]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[26]  Felix Abramovich,et al.  MAP model selection in Gaussian regression , 2009, 0912.4387.

[27]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[28]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[29]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[30]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[31]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[32]  Lie Wang,et al.  Shifting Inequality and Recovery of Sparse Signals , 2010, IEEE Transactions on Signal Processing.

[33]  E. Barrio Comments on: l1-penalization for mixture regression models , 2010 .

[34]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[35]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[36]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[37]  Shu Yang,et al.  Target Detection Via Network Filtering , 2009, IEEE Transactions on Information Theory.

[38]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.