Generalization bounds for deep thresholding networks

We consider compressive sensing in the scenario where the sparsity basis (dictionary) is not known in advance, but needs to be learned from examples. Motivated by the well-known iterative soft thresholding algorithm for the reconstruction, we define deep networks parametrized by the dictionary, which we call deep thresholding networks. Based on training samples, we aim at learning the optimal sparsifying dictionary and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. The dictionary learning is performed via minimizing the empirical risk. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks. We obtain estimates of the sample complexity that depend only linearly on the dimensions and on the depth.

[1]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[2]  Alexandros Georgogiannis,et al.  The Generalization Error of Dictionary Learning with Moreau Envelopes , 2018, ICML.

[3]  Shie Mannor,et al.  The Sample Complexity of Dictionary Learning , 2010, COLT.

[4]  Rémi Gribonval,et al.  Dictionary Identification - Sparse Matrix-Factorisation via ℓ _ 1 -Minimisation , 2020 .

[5]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[6]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[7]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[8]  Xiaohan Chen,et al.  ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA , 2018, ICLR.

[9]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[10]  Karin Schnass,et al.  On the Identifiability of Overcomplete Dictionaries via the Minimisation Principle Underlying K-SVD , 2013, ArXiv.

[11]  Eduardo D. Sontag,et al.  Vapnik-Chervonenkis Dimension of Recurrent Neural Networks , 1997, Discret. Appl. Math..

[12]  Yonina C. Eldar,et al.  Performance limits of dictionary learning for sparse coding , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[13]  Yonina C. Eldar,et al.  On the Minimax Risk of Dictionary Learning , 2015, IEEE Transactions on Information Theory.

[14]  Eduardo D. Sontag,et al.  Sample complexity for learning recurrent perceptron mappings , 1995, IEEE Trans. Inf. Theory.

[15]  Karin Schnass,et al.  Dictionary Identification—Sparse Matrix-Factorization via $\ell_1$ -Minimization , 2009, IEEE Transactions on Information Theory.

[16]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[17]  Xiaohan Chen,et al.  Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[18]  Yiwen Guo,et al.  Sparse Coding with Gated Learned ISTA , 2020, ICLR.

[19]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[20]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[21]  Michael Elad,et al.  Ada-LISTA: Learned Solvers Adaptive to Varying Models , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[23]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[24]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[25]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[26]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[27]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[29]  Simon R. Arridge,et al.  Solving inverse problems using data-driven models , 2019, Acta Numerica.

[30]  Bernard Ghanem,et al.  ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Ben Adcock,et al.  The troublesome kernel: why deep learning for inverse problems is typically unstable , 2020, ArXiv.

[32]  Wen Gao,et al.  Maximal Sparsity with Deep Networks? , 2016, NIPS.

[33]  Pierre Vandergheynst,et al.  Compressed Sensing and Redundant Dictionaries , 2007, IEEE Transactions on Information Theory.

[34]  慧 廣瀬 A Mathematical Introduction to Compressive Sensing , 2015 .

[35]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[36]  Gabriele Steidl,et al.  Parseval Proximal Neural Networks , 2019, Journal of Fourier Analysis and Applications.

[37]  Raja Giryes,et al.  Learned Convolutional Sparse Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[39]  Eduardo D. Sontag,et al.  Vapnik-Chervonenkis Dimension of Recurrent Neural Networks , 1998, Discret. Appl. Math..

[40]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[41]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[42]  Hassan Mansour,et al.  Learning Optimal Nonlinearities for Iterative Thresholding Algorithms , 2015, IEEE Signal Processing Letters.

[43]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[44]  Richard G. Baraniuk,et al.  A deep learning approach to structured signal recovery , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).