Deeply-Sparse Signal rePresentations ($\text{D}\text{S}^2\text{P}$)

A recent line of work shows that a deep neural network with ReLU nonlinearities arises from a finite sequence of cascaded sparse coding models, the outputs of which, except for the last element in the cascade, are sparse and unobservable. That is, intermediate outputs deep in the cascade are sparse, hence the title of this manuscript. We show here, using techniques from the dictionary learning literature that, if the measurement matrices in the cascaded sparse coding model (a) satisfy RIP and (b) all have sparse columns except for the last, they can be recovered with high probability. We propose two algorithms for this purpose: one that recovers the matrices in a forward sequence, and another that recovers them in a backward sequence. The method of choice in deep learning to solve this problem is by training an auto-encoder. Our algorithms provide a sound alternative, with theoretical guarantees, as well upper bounds on sample complexity. The theory shows that the learning complexity of the forward algorithm depends on the number of hidden units at the deepest layer and the number of active neurons at that layer (sparsity). In addition, the theory relates the number of hidden units in successive layers, thus giving a practical prescription for designing deep ReLU neural networks. Because it puts fewer restrictions on the architecture, the backward algorithm requires more data. We demonstrate the deep dictionary learning algorithm via simulations. Finally, we use a coupon-collection argument to conjecture a lower bound on sample complexity that gives some insight as to why deep networks require more data to train than shallow ones.

[1]  Yann LeCun,et al.  Discriminative Recurrent Sparse Auto-Encoders , 2013, ICLR.

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Michael Elad,et al.  Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning , 2017, IEEE Transactions on Signal Processing.

[4]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[5]  Yonina C. Eldar,et al.  On the Minimax Risk of Dictionary Learning , 2015, IEEE Transactions on Information Theory.

[6]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..

[7]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[8]  Bahareh Tolooshams,et al.  SCALABLE CONVOLUTIONAL DICTIONARY LEARNING WITH CONSTRAINED RECURRENT SPARSE AUTO-ENCODERS , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[10]  Raja Giryes,et al.  Learned Convolutional Sparse Coding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Bahareh Tolooshams,et al.  Deep Residual Autoencoders for Expectation Maximization-Inspired Dictionary Learning. , 2020, IEEE transactions on neural networks and learning systems.

[12]  Bahareh Tolooshams,et al.  Randnet: Deep Learning with Compressed Measurements of Images , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Michael Elad,et al.  Multi Layer Sparse Coding: the Holistic Way , 2018, SIAM J. Math. Data Sci..

[14]  Michael Elad,et al.  The Cosparse Analysis Model and Algorithms , 2011, ArXiv.

[15]  Chinmay Hegde,et al.  On the Dynamics of Gradient Descent for Autoencoders , 2019, AISTATS.

[16]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[17]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[18]  Xiaohan Chen,et al.  Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds , 2018, NeurIPS.

[19]  Jong Chul Ye,et al.  Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems , 2017, SIAM J. Imaging Sci..

[20]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[21]  Bruno Sericola,et al.  New results on a generalized coupon collector problem using Markov chains , 2015 .

[22]  Michael Elad,et al.  Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding , 2017, IEEE Transactions on Signal Processing.

[23]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[24]  Rahul Garg,et al.  Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property , 2009, ICML '09.

[25]  Michael Elad,et al.  On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[27]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Rémi Gribonval,et al.  Chasing butterflies: In search of efficient dictionaries , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).