Structured Sparsity: Discrete and Convex approaches

During the past decades, sparsity has been shown to be of significant importance in fields such as compression, signal sampling and analysis, machine learning, and optimization. In fact, most natural data can be sparsely represented, i.e., a small set of coefficients is sufficient to describe the data using an appropriate basis. Sparsity is also used to enhance interpretability in real-life applications, where the relevant information therein typically resides in a low dimensional space. However, the true underlying structure of many signal processing and machine learning problems is often more sophisticated than sparsity alone. In practice, what makes applications differ is the existence of sparsity patterns among coefficients. In order to better understand the impact of such structured sparsity patterns, in this chapter we review some realistic sparsity models and unify their convex and non-convex treatments. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and hierarchical models. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications in image processing, neuronal signal processing, and confocal imaging.

[1]  J. F. Bonnans,et al.  Local analysis of Newton-type methods for variational inequalities and nonlinear programming , 1994 .

[2]  Volkan Cevher,et al.  Tractability of interpretability via selection of group-sparse models , 2013, 2013 IEEE International Symposium on Information Theory.

[3]  Hua Zhou,et al.  Association screening of common and rare genetic variants by penalized regression , 2010, Bioinform..

[4]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[5]  Abhimanyu Das,et al.  Selecting Diverse Features via Spectral Regularization , 2012, NIPS.

[6]  Richard G. Baraniuk,et al.  Near Best Tree Approximation , 2002, Adv. Comput. Math..

[7]  Nikhil S. Rao,et al.  Signal Recovery in Unions of Subspaces with Applications to Compressive Imaging , 2012, 1209.3079.

[8]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[9]  Baocai Yin,et al.  MR images reconstruction based on TV-Group sparse model , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[10]  M. Fukushima,et al.  A generalized proximal point algorithm for certain non-convex minimization problems , 1981 .

[11]  V. Cevher,et al.  A Primal-Dual Algorithmic Framework for Constrained Convex Minimization , 2014, 1406.5403.

[12]  Alfred O. Hero,et al.  Order-Preserving Factor Analysis—Application to Longitudinal Gene Expression , 2011, IEEE Transactions on Signal Processing.

[13]  Coralia Cartis,et al.  An Exact Tree Projection Algorithm for Wavelets , 2013, IEEE Signal Processing Letters.

[14]  E.J. Candes Compressive Sampling , 2022 .

[15]  Hui Lin,et al.  On fast approximate submodular minimization , 2011, NIPS.

[16]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[17]  Volkan Cevher,et al.  Low-Dimensional Models for Dimensionality Reduction and Signal Recovery: A Geometric Perspective , 2010, Proceedings of the IEEE.

[18]  James B. Orlin,et al.  A faster strongly polynomial time algorithm for submodular function minimization , 2007, Math. Program..

[19]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[20]  Pradeep Ravikumar,et al.  BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[21]  Yonina C. Eldar,et al.  STFT Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms , 2015, IEEE Journal of Selected Topics in Signal Processing.

[22]  Emmanuel Barillot,et al.  Classification of arrayCGH data using fused SVM , 2008, ISMB.

[23]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[24]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[25]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[26]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[27]  Babak Hassibi,et al.  On the Reconstruction of Block-Sparse Signals With an Optimal Number of Measurements , 2008, IEEE Transactions on Signal Processing.

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[30]  Mike E. Davies,et al.  Sampling Theorems for Signals From the Union of Finite-Dimensional Linear Subspaces , 2009, IEEE Transactions on Information Theory.

[31]  Po-Ling Loh,et al.  Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , 2012, NIPS.

[32]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[33]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[34]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[35]  S. Fujishige,et al.  A Submodular Function Minimization Algorithm Based on the Minimum-Norm Base ⁄ , 2009 .

[36]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[37]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[39]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[40]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[41]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Group-Sparse Regularization , 2011, AISTATS.

[42]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[43]  Vwani P. Roychowdhury,et al.  Covariance selection for nonchordal graphs via chordal embedding , 2008, Optim. Methods Softw..

[44]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Structured Union of Subspaces , 2008, IEEE Transactions on Information Theory.

[45]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[46]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[47]  Satoru Fujishige,et al.  Realization of set functions as cut functions of graphs and hypergraphs , 2001, Discret. Math..

[48]  Weiyu Xu,et al.  Compressed sensing - probabilistic analysis of a null-space characterization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Richard G. Baraniuk,et al.  Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..

[50]  Barnabás Póczos,et al.  Online group-structured dictionary learning , 2011, CVPR 2011.

[51]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[52]  Andrew V. Goldberg,et al.  Beyond the flow decomposition barrier , 1998, JACM.

[53]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[54]  Stéphane Canu,et al.  Kernel Basis Pursuit , 2005, Rev. d'Intelligence Artif..

[55]  Shaddin Dughmi Submodular Functions: Extensions, Distributions, and Algorithms. A Survey , 2009, ArXiv.

[56]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[57]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[59]  G. Cecchi,et al.  Linear Inverse Problems with Norm and Sparsity Constraints , 2014 .

[60]  Jeff A. Bilmes,et al.  Q-Clustering , 2005, NIPS.

[61]  D. Psaltis,et al.  Digital confocal microscope. , 2012, Optics express.

[62]  Volkan Cevher,et al.  Composite self-concordant minimization , 2013, J. Mach. Learn. Res..

[63]  Lawrence Carin,et al.  Exploiting Structure in Wavelet-Based Bayesian Compressive Sensing , 2009, IEEE Transactions on Signal Processing.

[64]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[65]  Trevor J. Hastie,et al.  Structure Learning of Mixed Graphical Models , 2013, AISTATS.

[66]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[67]  Babak Hassibi,et al.  Sparse Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms , 2013, IEEE Transactions on Signal Processing.

[68]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[69]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[70]  Volkan Cevher,et al.  Scalable Sparse Covariance Estimation via Self-Concordance , 2014, AAAI.

[71]  Marco F. Duarte,et al.  Compressive sensing recovery of spike trains using a structured sparsity model , 2009 .

[72]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[73]  M. Schmid Principles Of Optics Electromagnetic Theory Of Propagation Interference And Diffraction Of Light , 2016 .

[74]  Richard G. Baraniuk,et al.  Optimal tree approximation with wavelets , 1999, Optics & Photonics.

[75]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[76]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[77]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[78]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[79]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[80]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[81]  Francis R. Bach,et al.  Convex Relaxation for Combinatorial Penalties , 2012, ArXiv.

[82]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[83]  Matthieu Kowalski,et al.  Improving M/EEG source localizationwith an inter-condition sparse prior , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[84]  Lixin Shen,et al.  Efficient First Order Methods for Linear Composite Regularizers , 2011, ArXiv.

[85]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[86]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[87]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[88]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[89]  Matthias Born,et al.  Principles of Optics: Electromagnetic Theory of Propa-gation, Interference and Di raction of Light , 1959 .

[90]  Volkan Cevher,et al.  Recipes on hard thresholding methods , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[91]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[92]  Maurice Charbit,et al.  OMP-type Algorithm with Structured Sparsity Patterns for Multipath Radar Signals , 2011, 1103.5158.

[93]  Volkan Cevher,et al.  Combinatorial selection and least absolute shrinkage via the Clash algorithm , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[94]  Rick P. Millane,et al.  Phase retrieval in crystallography and optics , 1990 .

[95]  Lorenzo Rosasco,et al.  Proximal methods for the latent group lasso penalty , 2012, Computational Optimization and Applications.

[96]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[97]  Javier de Diego,et al.  Proceedings oh the International Congress of Mathematicians: Madrid, August 22-30,2006 : invited lectures , 2006 .

[98]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[99]  M. Seeger On the Submodularity of Linear Experimental Design , 2009 .

[100]  Noah A. Smith,et al.  Structured Sparsity in Structured Prediction , 2011, EMNLP.

[101]  Piotr Indyk,et al.  On Model-Based RIP-1 Matrices , 2013, ICALP.

[102]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[103]  S. Mallat A wavelet tour of signal processing , 1998 .

[104]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[105]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[106]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[107]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[108]  E. M. Opdam,et al.  The two-dimensional Ising model , 2018, From Quarks to Pions.

[109]  Zachary T. Harmany,et al.  Sparse poisson intensity reconstruction algorithms , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[110]  Volkan Cevher,et al.  Constrained convex minimization via model-based excessive gap , 2014, NIPS.

[111]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[112]  Bertrand Thirion,et al.  Multi-scale Mining of fMRI Data with Hierarchical Structured Sparsity , 2011, 2011 International Workshop on Pattern Recognition in NeuroImaging.

[113]  Lorenzo Rosasco,et al.  A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups , 2010, NIPS.

[114]  Luca Baldassarre,et al.  Accelerated and Inexact Forward-Backward Algorithms , 2013, SIAM J. Optim..

[115]  O. SIAMJ. PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[116]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[117]  Volkan Cevher,et al.  Fast proximal algorithms for Self-concordant function minimization with application to sparse graph selection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[118]  최성민 Confocal laser scanning microscopy를 이용한 micro PIV 혈류 가시화 기법에 관한 연구 , 2012 .

[119]  Volkan Cevher,et al.  Group-Sparse Model Selection: Hardness and Relaxations , 2013, IEEE Transactions on Information Theory.

[120]  Nick G. Kingsbury,et al.  Convex approaches to model wavelet sparsity patterns , 2011, 2011 18th IEEE International Conference on Image Processing.

[121]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[122]  Mário A. T. Figueiredo,et al.  A novel sparsity and clustering regularization , 2013, ArXiv.

[123]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[124]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[125]  Stephen M. Robinson,et al.  Strongly Regular Generalized Equations , 1980, Math. Oper. Res..

[126]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[127]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[128]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[129]  Volkan Cevher,et al.  To convexify or not? Regression with clustering penalties on graphs , 2013, 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[130]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[131]  Mohamed Hebiri,et al.  Learning Heteroscedastic Models by Convex Programming under Group Sparsity , 2013, ICML.

[132]  P. Tseng Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .

[133]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  M. Born Principles of Optics : Electromagnetic theory of propagation , 1970 .

[135]  Niels Richard Hansen,et al.  Sparse group lasso and high dimensional multinomial classification , 2012, Comput. Stat. Data Anal..

[136]  Volkan Cevher,et al.  Hard thresholding with norm constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[137]  Joseph Naor,et al.  A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization , 2015, SIAM J. Comput..

[138]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[139]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, CDC 1981.

[140]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[141]  Shiqian Ma,et al.  Efficient algorithms for robust and stable principal component pursuit problems , 2013, Comput. Optim. Appl..

[142]  Simon Foucart,et al.  Hard Thresholding Pursuit: An Algorithm for Compressive Sensing , 2011, SIAM J. Numer. Anal..

[143]  Volkan Cevher,et al.  Model-based Sketching and Recovery with Expanders , 2014, SODA.

[144]  Andreas Krause,et al.  Submodular Dictionary Selection for Sparse Representation , 2010, ICML.

[145]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[146]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[147]  Volkan Cevher,et al.  Sparse Signal Recovery Using Markov Random Fields , 2008, NIPS.

[148]  Volkan Cevher,et al.  An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization , 2013, SIAM J. Optim..

[149]  Volkan Cevher,et al.  A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions , 2013, ICML.

[150]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..