On ${l}_{q}$ Optimization and Sparse Inverse Covariance Selection

Graphical models are well established in providing meaningful conditional probability descriptions of complex multivariable interactions. In the Gaussian case, the conditional independencies between different variables correspond to zero entries in the precision (inverse covariance) matrix. Hence, there has been much recent interest in sparse precision matrix estimation in areas such as statistics, machine learning, computer vision, pattern recognition, and signal processing. A popular estimation method involves optimizing a penalized log-likelihood problem. The penalty is responsible for inducing sparsity and a common choice is the convex l1 norm. Even though the l0 penalty is the natural choice guaranteeing maximum sparsity, it has been avoided due to lack of convexity. As a result, in this paper we bridge the gap between these two penalties and propose the non-concave lq penalized log-likelihood problem for sparse precision matrix estimation where 0 ≤ q <; 1. A novel algorithm is developed for the optimization and we provide some of its theoretic properties that are useful in sparse linear regression. We illustrate on synthetic and real data, showing reconstruction quality comparisons of sparsity inducing penalties:l0, lq with 0 <; q <; 1, l1, and SCAD.

[1]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[2]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[3]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[4]  Goran Marjanovic,et al.  On $l_q$ Optimization and Matrix Completion , 2012, IEEE Transactions on Signal Processing.

[5]  Shiqian Ma,et al.  Sparse Inverse Covariance Selection via Alternating Linearization Methods , 2010, NIPS.

[6]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[7]  P. Huard,et al.  Point-to-set maps and mathematical programming , 1979 .

[8]  Victor Solo,et al.  On vector L0 penalized multivariate regression , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Mike E. Davies,et al.  Iterative Hard Thresholding and L0 Regularisation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Goran Marjanovic,et al.  lq matrix completion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Goran Marjanovic,et al.  L0 sparse graphical modeling , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Kim-Chuan Toh,et al.  Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[13]  Mathematical programming study 10 , 1979 .

[14]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[15]  V. Solo,et al.  MATRIX COMPLETION , 2012 .

[16]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[17]  Jorge Nocedal,et al.  Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[20]  Goran Marjanovic,et al.  On exact lq denoising , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Dirk A. Lorenz,et al.  Minimization of Non-smooth, Non-convex Functionals by Iterative Thresholding , 2015, J. Optim. Theory Appl..

[22]  Yuehua Wu,et al.  Tuning parameter selection for penalized likelihood estimation of inverse covariance matrix , 2009 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Jianqing Fan,et al.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES. , 2009, The annals of applied statistics.

[25]  Xiaoming Yuan,et al.  Alternating Direction Methods for Sparse Covariance Selection * , 2009 .

[26]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[27]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[30]  Massimo Fornasier,et al.  Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[31]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[32]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[33]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[34]  I F Gorodnitsky,et al.  Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm. , 1995, Electroencephalography and clinical neurophysiology.

[35]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[36]  Jianqing Fan,et al.  COMMENTS ON « WAVELETS IN STATISTICS : A REVIEW , 2009 .

[37]  A. d'Aspremont,et al.  A Pathwise Algorithm for Covariance Selection , 2009, 0908.0143.

[38]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[39]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[40]  Haipeng Shen,et al.  Analysis of call centre arrival data using singular value decomposition , 2005 .

[41]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[42]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[43]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[44]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[45]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[46]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[47]  Graham J. Wills,et al.  Introduction to graphical modelling , 1995 .

[48]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[49]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[50]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[51]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[52]  Katya Scheinberg,et al.  Learning Sparse Gaussian Markov Networks Using a Greedy Coordinate Ascent Approach , 2010, ECML/PKDD.

[53]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[54]  Walter Zucchini,et al.  Model Selection , 2011, International Encyclopedia of Statistical Science.

[55]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[56]  D. Edwards Introduction to graphical modelling , 1995 .

[57]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[58]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[59]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[60]  Avishai Mandelbaum,et al.  Statistical Analysis of a Telephone Call Center , 2005 .

[61]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[62]  R.G. Baraniuk,et al.  Compressive Sensing [Lecture Notes] , 2007, IEEE Signal Processing Magazine.

[63]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[64]  Jianqing Fan,et al.  Comments on «Wavelets in statistics: A review» by A. Antoniadis , 1997 .

[65]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..