Theoretical Foundations of Deep Learning via Sparse Representations: A Multilayer Sparse Model and Its Connection to Convolutional Neural Networks

Modeling data is the way we-scientists-believe that information should be explained and handled. Indeed, models play a central role in practically every task in signal and image processing and machine learning. Sparse representation theory (we shall refer to it as Sparseland) puts forward an emerging, highly effective, and universal model. Its core idea is the description of data as a linear combination of few atoms taken from a dictionary of such fundamental elements.

[1]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[2]  S. Muthukrishnan,et al.  Improved sparse approximation over quasiincoherent dictionaries , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[3]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[4]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[5]  K. Jbilou,et al.  Sylvester Tikhonov-regularization methods in image restoration , 2007 .

[6]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[7]  Wen Gao,et al.  Maximal Sparsity with Deep Networks? , 2016, NIPS.

[8]  Gitta Kutyniok,et al.  1 . 2 Sparsity : A Reasonable Assumption ? , 2012 .

[9]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[10]  Karthikeyan Natesan Ramamurthy,et al.  Image Understanding Using Sparse Representations , 2014, Synthesis Lectures on Image, Video, and Multimedia Processing.

[11]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[12]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[13]  Lloyd R. Welch,et al.  Lower bounds on the maximum cross correlation of signals (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[14]  Michael Elad,et al.  Large Inpainting of Face Images With Trainlets , 2016, IEEE Signal Processing Letters.

[15]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[16]  Vishal M. Patel Sparse and Redundant Representations for Inverse Problems and Recognition , 2010 .

[17]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[18]  Gordon Wetzstein,et al.  Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[20]  Michael B. Wakin Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity (Starck, J.-L., et al; 2010) [Book Reviews] , 2011, IEEE Signal Processing Magazine.

[21]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[22]  Helmut Bölcskei,et al.  Deep convolutional neural networks on cartoon functions , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[23]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[24]  Trac D. Tran,et al.  Supervised Multilayer Sparse Coding Networks for Image Classification , 2017, ArXiv.

[25]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[26]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[27]  Rama Chellappa,et al.  Sparse Representations and Compressive Sensing for Imaging and Vision , 2013, Springer Briefs in Electrical and Computer Engineering.

[28]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[29]  Denis Pellerin,et al.  Multi-layer Dictionary Learning for Image Classification , 2016, ACIVS.

[30]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[31]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[32]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[33]  Simon Lucey,et al.  Convolutional Sparse Coding for Trajectory Reconstruction , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[35]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[36]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37]  Gitta Kutyniok,et al.  Introduction to Shearlets , 2012 .

[38]  Lei Zhang,et al.  Convolutional Sparse Coding for Image Super-Resolution , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Yangkang Chen,et al.  Double Sparsity Dictionary for Seismic Noise Attenuation , 2016 .

[41]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Michael Elad,et al.  On the Global-Local Dichotomy in Sparsity Modeling , 2017, ArXiv.

[44]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[45]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pierre Vandergheynst,et al.  Average Performance Analysis for Thresholding , 2007, IEEE Signal Processing Letters.

[47]  Helmut Bölcskei,et al.  Optimal Approximation with Sparsely Connected Deep Neural Networks , 2017, SIAM J. Math. Data Sci..

[48]  Bruno A. Olshausen,et al.  Learning Sparse Codes for Hyperspectral Imagery , 2011, IEEE Journal of Selected Topics in Signal Processing.

[49]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[50]  Simon Lucey,et al.  Optimization Methods for Convolutional Sparse Coding , 2014, ArXiv.

[51]  David L. Donoho,et al.  Curvelets, multiresolution representation, and scaling laws , 2000, SPIE Optics + Photonics.

[52]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[53]  Yonina C. Eldar Sampling Theory: Beyond Bandlimited Systems , 2015 .

[54]  Michael Elad,et al.  Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning , 2018, IEEE Transactions on Signal Processing.

[55]  James H. McClellan,et al.  Seismic data denoising through multiscale and sparsity-promoting dictionary learning , 2015 .

[56]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[57]  Brendt Wohlberg,et al.  Efficient Algorithms for Convolutional Sparse Representations , 2016, IEEE Transactions on Image Processing.

[58]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[59]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[60]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[61]  Stefano Soatto,et al.  Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).

[62]  D. W. D.Sc.,et al.  XLII. On certain fundamental principles of scientific inquiry , 1921 .

[63]  Mark D. Plumbley,et al.  Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.

[64]  Zongben Xu,et al.  Image Inpainting by Patch Propagation Using Patch Sparsity , 2010, IEEE Transactions on Image Processing.

[65]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[66]  Michael Elad,et al.  Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding , 2017, IEEE Transactions on Signal Processing.

[67]  Tomaso Poggio,et al.  Learning Functions: When Is Deep Better Than Shallow , 2016, 1603.00988.

[68]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[69]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[70]  Peyman Milanfar,et al.  Is Denoising Dead? , 2010, IEEE Transactions on Image Processing.

[71]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[72]  Vishal M. Patel,et al.  Convolutional Sparse Coding-based Image Decomposition , 2016, BMVC.

[73]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[74]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[75]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[76]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[78]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[79]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[80]  Yi Yang,et al.  Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Anat Levin,et al.  Natural image denoising: Optimality and inherent bounds , 2011, CVPR 2011.

[82]  Michael Elad,et al.  Convolutional Dictionary Learning via Local Processing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .