Methods for Sparse and Low-Rank Recovery under Simplex Constraints

The de-facto standard approach of promoting sparsity by means of $\ell_1$-regularization becomes ineffective in the presence of simplex constraints, i.e.,~the target is known to have non-negative entries summing up to a given constant. The situation is analogous for the use of nuclear norm regularization for low-rank recovery of Hermitian positive semidefinite matrices with given trace. In the present paper, we discuss several strategies to deal with this situation, from simple to more complex. As a starting point, we consider empirical risk minimization (ERM). It follows from existing theory that ERM enjoys better theoretical properties w.r.t.~prediction and $\ell_2$-estimation error than $\ell_1$-regularization. In light of this, we argue that ERM combined with a subsequent sparsification step like thresholding is superior to the heuristic of using $\ell_1$-regularization after dropping the sum constraint and subsequent normalization. At the next level, we show that any sparsity-promoting regularizer under simplex constraints cannot be convex. A novel sparsity-promoting regularization scheme based on the inverse or negative of the squared $\ell_2$-norm is proposed, which avoids shortcomings of various alternative methods from the literature. Our approach naturally extends to Hermitian positive semidefinite matrices with given trace. Numerical studies concerning compressed sensing, sparse mixture density estimation, portfolio optimization and quantum state tomography are used to illustrate the key points of the paper.

[1]  Karim Lounici,et al.  High-dimensional stochastic optimization with the generalized Dantzig estimator , 2008, 0811.2281.

[2]  Volkan Cevher,et al.  Recipes on hard thresholding methods , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[3]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[4]  Jianqing Fan,et al.  Features of Big Data and sparsest solution in high confidence set , 2014 .

[5]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[6]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[7]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[8]  D. Donoho,et al.  Neighborliness of randomly projected simplices in high dimensions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ping Li,et al.  A Tight Bound of Hard Thresholding , 2016, J. Mach. Learn. Res..

[10]  V. Koltchinskii Von Neumann Entropy Penalization and Low Rank Matrix Estimation , 2010, 1009.2439.

[11]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[12]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[13]  Z. Hradil Quantum-state estimation , 1996, quant-ph/9609012.

[14]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[15]  Volkan Cevher,et al.  Sparse projections onto the simplex , 2012, ICML.

[16]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[17]  Yi-Kai Liu,et al.  Universal low-rank matrix recovery from Pauli measurements , 2011, NIPS.

[18]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[19]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[20]  Suchi Saria,et al.  Convex envelopes of complexity controlling penalties: the case against premature envelopment , 2011, AISTATS.

[21]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[22]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[23]  I. Chuang,et al.  Quantum Computation and Quantum Information: Introduction to the Tenth Anniversary Edition , 2010 .

[24]  Minyu Peng,et al.  Eigenvalues of Deformed Random Matrices , 2012, 1205.0572.

[25]  Matthias Hein,et al.  Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization , 2012, 1205.0953.

[26]  Yazhen Wang Asymptotic equivalence of quantum state tomography and noisy matrix completion , 2013, 1311.4976.

[27]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[28]  O. Klopp Rank penalized estimators for high-dimensional matrices , 2011, 1104.1244.

[29]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[30]  Gareth M. James,et al.  Penalized and Constrained Regression , 2013 .

[31]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[32]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[33]  Anru R. Zhang,et al.  Regression Analysis for Microbiome Compositional Data , 2016, 1603.00974.

[34]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[35]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[36]  N. Meinshausen Sign-constrained least squares estimation for high-dimensional regression , 2012, 1202.0889.

[37]  A. Tsybakov,et al.  SPADES AND MIXTURE MODELS , 2009, 0901.2044.

[38]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[39]  Gareth M. James,et al.  A generalized Dantzig selector with shrinkage tuning , 2009 .

[40]  T. Tao Topics in Random Matrix Theory , 2012 .

[41]  Hosik Choi,et al.  Consistent Model Selection Criteria on High Dimensions , 2012, J. Mach. Learn. Res..

[42]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[43]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[44]  Johan Ugander,et al.  A concave regularization technique for sparse mixture models , 2011, NIPS.

[45]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[46]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[47]  S. Mendelson,et al.  Compressed sensing under weak moment assumptions , 2014, 1401.2188.

[48]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[49]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[50]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[51]  Bhiksha Raj,et al.  Sparse Overcomplete Latent Variable Decomposition of Counts Data , 2007, NIPS.

[52]  R. Tibshirani,et al.  A Study of Error Variance Estimation in Lasso Regression , 2013, 1311.5274.

[53]  Philip Schniter,et al.  An Empirical-Bayes Approach to Recovering Linearly Constrained Non-Negative Sparse Signals , 2013, IEEE Transactions on Signal Processing.

[54]  Maryam Fazel,et al.  Iterative reweighted algorithms for matrix rank minimization , 2012, J. Mach. Learn. Res..

[55]  Jiawei Han,et al.  Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation , 2016, ICML.

[56]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[57]  Venkat Chandrasekaran,et al.  Recovery of Sparse Probability Measures via Convex Programming , 2012, NIPS.

[58]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[59]  Harrison H. Zhou,et al.  Optimal large-scale quantum state tomography with Pauli measurements , 2016, 1603.07559.

[60]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[61]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[62]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[63]  I. Daubechies,et al.  Sparse and stable Markowitz portfolios , 2007, Proceedings of the National Academy of Sciences.

[64]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[65]  Nirmal Keshava,et al.  A Survey of Spectral Unmixing Algorithms , 2003 .

[66]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.