Deep Sparse-coded Network (DSN)

We present Deep Sparse-coded Network (DSN), a deep architecture based on multilayer sparse coding. It has been considered difficult to learn a useful feature hierarchy by stacking sparse coding layers in a straightforward manner. The primary reason is the modeling assumption for sparse coding that takes in a dense input and yields a sparse output vector. Applying a sparse coding layer on the output of another tends to violate the modeling assumption. We overcome this shortcoming by interlacing nonlinear pooling units. Average- or max-pooled sparse codes are aggregated to form dense input vectors for the next sparse coding layer. Pooling achieves nonlinear activation analogous to neural networks while not introducing diminished gradient flows during the training. We introduce a novel backpropagation algorithm to finetune the proposed DSN beyond the pretraining via greedy layerwise sparse coding and dictionary learning. We build an experimental 4-layer DSN with the ℓ1-regularized LARS and the greedy-ℓ0 OMP, and demonstrate superior performance over a similarly-configured stacked autoencoder (SAE) on CIFAR-10.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[6]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[7]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[8]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[9]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[10]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  M. Lewicki,et al.  Learning higher-order structures in natural images , 2003, Network.

[13]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.