Convolutional Sparse Autoencoders for Image Classification

Convolutional sparse coding (CSC) can model local connections between image content and reduce the code redundancy when compared with patch-based sparse coding. However, CSC needs a complicated optimization procedure to infer the codes (i.e., feature maps). In this brief, we proposed a convolutional sparse auto-encoder (CSAE), which leverages the structure of the convolutional AE and incorporates the max-pooling to heuristically sparsify the feature maps for feature learning. Together with competition over feature channels, this simple sparsifying strategy makes the stochastic gradient descent algorithm work efficiently for the CSAE training; thus, no complicated optimization procedure is involved. We employed the features learned in the CSAE to initialize convolutional neural networks for classification and achieved competitive results on benchmark data sets. In addition, by building connections between the CSAE and CSC, we proposed a strategy to construct local descriptors from the CSAE for classification. Experiments on Caltech-101 and Caltech-256 clearly demonstrated the effectiveness of the proposed method and verified the CSAE as a CSC model has the ability to explore connections between neighboring image content for classification tasks.

[1]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[4]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[5]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[7]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[8]  Maoguo Gong,et al.  A Multiobjective Sparse Feature Learning Model for Deep Neural Networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[10]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Philip S. Yu,et al.  The Unsupervised Hierarchical Convolutional Sparse Auto-Encoder for Neuroimaging Data Classification , 2015, BIH.

[12]  Jian Yang,et al.  Sparseness Analysis in the Pretraining of Deep Neural Networks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Anders P. Eriksson,et al.  Fast Convolutional Sparse Coding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jacek M. Zurada,et al.  Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Vincent Lepetit,et al.  On the relevance of sparsity for image classification , 2014, Comput. Vis. Image Underst..

[18]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[20]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[24]  Christian Wolf,et al.  Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification , 2012, BMVC.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David B. Dunson,et al.  Deep Learning with Hierarchical Convolutional Factor Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.