Learning Deep $\ell_0$ Encoders

Despite its nonconvex nature, l0 sparse approximation is desirable in many theoretical and application cases. We study the l0 sparse approximation problem with the tool of deep learning, by proposing Deep l0 Encoders. Two typical forms, the l0 regularized problem and the M-sparse problem, are investigated. Based on solid iterative algorithms, we model them as feed-forward neural networks, through introducing novel neurons and pooling functions. Enforcing such structural priors acts as an effective network regularization. The deep encoders also enjoy faster inference, larger learning capacity, and better scalability compared to conventional sparse coding solutions. Furthermore, under task-driven losses, the models can be conveniently optimized from end to end. Numerical results demonstrate the impressive performances of the proposed encoders.

[1]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jiayu Zhou,et al.  Learning A Task-Specific Deep Architecture For Clustering , 2015, SDM.

[3]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[4]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[5]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[6]  XuYi,et al.  Image smoothing via L0 gradient minimization , 2011 .

[7]  Bernard Ghanem,et al.  ℓ0TV: A new method for image restoration in the presence of impulse noise , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas S. Huang,et al.  Semisupervised Hyperspectral Classification Using Task-Driven Dictionary Learning With Laplacian Regularization , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  B. Rao,et al.  ℓâ‚€-norm Minimization for Basis Selection , 2004, NIPS 2004.

[11]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[12]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[13]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Trac D. Tran,et al.  Hyperspectral Image Classification Using Dictionary-Based Sparse Representation , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Guillermo Sapiro,et al.  Supervised Sparse Analysis and Synthesis Operators , 2013, NIPS.

[16]  Razvan Pascanu,et al.  Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks , 2013, ECML/PKDD.

[17]  Martin Vetterli,et al.  Data Compression and Harmonic Analysis , 1998, IEEE Trans. Inf. Theory.

[18]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[20]  Charles A. Micchelli,et al.  How to Choose an Activation Function , 1993, NIPS.

[21]  N. Mitianoudis,et al.  Simple mixture model for sparse overcomplete ICA , 2004 .

[22]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[23]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[24]  Aarti Singh,et al.  Clustering Consistent Sparse Subspace Clustering , 2015, ArXiv.

[25]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[26]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[29]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[31]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[32]  Bhaskar D. Rao,et al.  L_0-norm Minimization for Basis Selection , 2004, NIPS.

[33]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[34]  Simon Fong,et al.  A Joint Optimization Framework of Sparse Coding and Discriminative Clustering , 2015, IJCAI.

[35]  Roland Memisevic,et al.  Zero-bias autoencoders and the benefits of co-adapting features , 2014, ICLR.