Sparse-coded net model and applications

As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task perform well, we argue that sparse coding should also be built as a holistic learning unit optimizing on the supervised task objectives more explicitly. In this paper, we propose sparse-coded net, a feedforward model that integrates sparse coding and task-driven output layers, and describe training methods in detail. After pretraining a sparse-coded net via semi-supervised learning, we optimize its task-specific performance in a novel backpropagation algorithm that can traverse nonlinear feature pooling operators to update the dictionary. Thus, sparse-coded net can be applied to supervised dictionary learning. We evaluate sparse-coded net with classification problems in sound, image, and text data. The results confirm a significant improvement over semi-supervised learning as well as superior classification performance against deep stacked autoencoder neural network and GMM-SVM pipelines in small to medium-scale settings.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  H. T. Kung,et al.  Twitter Geolocation and Regional Classification via Sparse Coding , 2015, ICWSM.

[4]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[8]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Li Deng,et al.  Are Sparse Representations Rich Enough for Acoustic Modeling? , 2012, INTERSPEECH.

[11]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[12]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[13]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[14]  Bo Du,et al.  Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding , 2015, Pattern Recognit..

[15]  Michael Elad,et al.  Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit , 2008 .

[16]  Douglas E. Sturim,et al.  Language Recognition via Sparse Coding , 2016, INTERSPEECH.

[17]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[18]  Jim Jing-Yan Wang,et al.  Joint learning of cross-modal classifier and factor analysis for multimedia data classification , 2015, Neural Computing and Applications.

[19]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[20]  Dan Stowell,et al.  Detection and Classification of Acoustic Scenes and Events , 2015, IEEE Transactions on Multimedia.

[21]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.