Compressible Distributions for High-Dimensional Statistics

We develop a principled way of identifying probability distributions whose independent and identically distributed realizations are compressible, i.e., can be well approximated as sparse. We focus on Gaussian compressed sensing, an example of underdetermined linear regression, where compressibility is known to ensure the success of estimators exploiting sparse regularization. We prove that many distributions revolving around maximum a posteriori (MAP) interpretation of sparse regularized estimators are in fact incompressible, in the limit of large problem sizes. We especially highlight the Laplace distribution and ^1 regularized estimators such as the Lasso and basis pursuit denoising. We rigorously disprove the myth that the success of ^1 minimization for compressed sensing image reconstruction is a simple corollary of a Laplace model of images combined with Bayesian MAP estimation, and show that in fact quite the reverse is true. To establish this result, we identify nontrivial undersampling regions where the simple least-squares solution almost surely outperforms an oracle sparse solution, when the data are generated from the Laplace distribution. We also provide simple rules of thumb to characterize classes of compressible and incompressible distributions based on their second and fourth moments. Generalized Gaussian and generalized Pareto distributions serve as running examples.

[1]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[2]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[3]  J. B. Robertson,et al.  ‘Wald's Lemma' for sums of order statistics of i.i.d. random variables , 1991, Advances in Applied Probability.

[4]  William Bialek,et al.  Statistics of Natural Images: Scaling in the Woods , 1993, NIPS.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7]  Richard G. Baraniuk,et al.  Wavelet statistical models and Besov spaces , 1999, Optics & Photonics.

[8]  Richard G. Baraniuk,et al.  Wavelet statistical models and Besov spaces , 1999 .

[9]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[10]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[12]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[13]  Rémi Gribonval,et al.  On the Strong Uniqueness of Highly Sparse Representations from Redundant Dictionaries , 2004, ICA.

[14]  Pierre Vandergheynst,et al.  A simple test to check the optimality of sparse signal approximations , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[16]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[17]  Pierre Vandergheynst,et al.  A simple test to check the optimality of a sparse signal approximation , 2006, Signal Process..

[18]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[19]  R. Gribonval,et al.  Highly sparse representations from dictionaries are unique and independent of the sparseness measure , 2007 .

[20]  M. Nikolova Model distortions in Bayesian MAP reconstruction , 2007 .

[21]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[22]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[23]  Matthias W. Seeger,et al.  Compressed sensing and Bayesian experimental design , 2008, ICML '08.

[24]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[25]  R. DeVore,et al.  Instance-optimality in probability with an ℓ1-minimization decoder , 2009 .

[26]  Mike E. Davies,et al.  Sampling Theorems for Signals From the Union of Finite-Dimensional Linear Subspaces , 2009, IEEE Transactions on Information Theory.

[27]  Volkan Cevher,et al.  Learning with Compressible Priors , 2009, NIPS.

[28]  M. Davies,et al.  On Lp minimisation, instance optimality, and restricted isometry constants for sparse approximation , 2009 .

[29]  Weiyu Xu,et al.  Compressive Sensing over the Grassmann Manifold: a Unified Geometric Framework , 2010, ArXiv.

[30]  Volkan Cevher,et al.  Low-Dimensional Models for Dimensionality Reduction and Signal Recovery: A Geometric Perspective , 2010, Proceedings of the IEEE.

[31]  Andrea Montanari,et al.  Compressed Sensing over ℓp-balls: Minimax mean square error , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[32]  Michael Unser,et al.  Compressibility of Deterministic and Random Infinite Sequences , 2011, IEEE Transactions on Signal Processing.

[33]  Rémi Gribonval,et al.  Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation? , 2011, IEEE Transactions on Signal Processing.