On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks

Parsimonious representations are ubiquitous in modeling and processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We explore different iterative methods to solve this new problem in practice, and we propose a new Multi-Layer Iterative Soft Thresholding Algorithm (ML-ISTA), as well as a fast version (ML-FISTA). We show that these nested first order algorithms converge, in the sense that the function value of near-fixed points can get arbitrarily close to the solution of the original problem. We further show how these algorithms effectively implement particular recurrent convolutional neural networks (CNNs) that generalize feed-forward ones without introducing any parameters. We present and analyze different architectures resulting from unfolding the iterations of the proposed pursuit algorithms, including a new Learned ML-ISTA, providing a principled way to construct deep recurrent CNNs. Unlike other similar constructions, these architectures unfold a global pursuit holistically for the entire network. We demonstrate the emerging constructions in a supervised learning setting, consistently improving the performance of classical CNNs while maintaining the number of parameters constant.

[1]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Michael Elad,et al.  Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding , 2017, IEEE Transactions on Signal Processing.

[3]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[5]  Michael Elad,et al.  Trainlets: Dictionary Learning in High Dimensions , 2016, IEEE Transactions on Signal Processing.

[6]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[7]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[8]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[9]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[10]  Yonina C. Eldar,et al.  Compressed Sensing with Coherent and Redundant Dictionaries , 2010, ArXiv.

[11]  Xiaohan Chen,et al.  Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[12]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  F. Browder Convergence theorems for sequences of nonlinear operators in Banach spaces , 1967 .

[14]  Michael Elad,et al.  Theoretical Foundations of Deep Learning via Sparse Representations: A Multilayer Sparse Model and Its Connection to Convolutional Neural Networks , 2018, IEEE Signal Processing Magazine.

[15]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[16]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[17]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[18]  Simon Lucey,et al.  Deep Component Analysis via Alternating Direction Neural Networks , 2018, ECCV.

[19]  E. Candès,et al.  Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[20]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[21]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[22]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[23]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[24]  Michael Elad,et al.  Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning , 2017, IEEE Transactions on Signal Processing.

[25]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[26]  Richard G. Baraniuk,et al.  Semi-Supervised Learning with the Deep Rendering Mixture Model , 2016, ArXiv.

[27]  Trac D. Tran,et al.  Supervised Deep Sparse Coding Networks , 2017, 2018 25th IEEE International Conference on Image Processing (ICIP).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[32]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[33]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[34]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[35]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[36]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[37]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  Bernard Ghanem,et al.  ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Heinz H. Bauschke,et al.  Firmly Nonexpansive Mappings and Maximally Monotone Operators: Correspondence and Duality , 2011, 1101.4688.

[41]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Michael Elad,et al.  Multi Layer Sparse Coding: the Holistic Way , 2018, SIAM J. Math. Data Sci..

[44]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[45]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[46]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[47]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[48]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.

[49]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[50]  Gitta Kutyniok,et al.  Shearlets: Multiscale Analysis for Multivariate Data , 2012 .

[51]  José M. Bioucas-Dias,et al.  An iterative algorithm for linear inverse problems with compound regularizers , 2008, 2008 15th IEEE International Conference on Image Processing.

[52]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[53]  Yonina C. Eldar,et al.  Tradeoffs Between Convergence Speed and Reconstruction Accuracy in Inverse Problems , 2016, IEEE Transactions on Signal Processing.

[54]  Xiaohan Chen,et al.  ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA , 2018, ICLR.

[55]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Song Li,et al.  Sparse Recovery with Coherent Tight Frame via Analysis Dantzig Selector and Analysis LASSO , 2013, ArXiv.

[58]  Joan Bruna,et al.  Understanding Trainable Sparse Coding via Matrix Factorization , 2016, 1609.00285.