Image Classification Based on Convolutional Denoising Sparse Autoencoder

Image classification aims to group images into corresponding semantic categories. Due to the difficulties of interclass similarity and intraclass variability, it is a challenging issue in computer vision. In this paper, an unsupervised feature learning approach called convolutional denoising sparse autoencoder (CDSAE) is proposed based on the theory of visual attention mechanism and deep learning methods. Firstly, saliency detection method is utilized to get training samples for unsupervised feature learning. Next, these samples are sent to the denoising sparse autoencoder (DSAE), followed by convolutional layer and local contrast normalization layer. Generally, prior in a specific task is helpful for the task solution. Therefore, a new pooling strategy—spatial pyramid pooling (SPP) fused with center-bias prior—is introduced into our approach. Experimental results on the common two image datasets (STL-10 and CIFAR-10) demonstrate that our approach is effective in image classification. They also demonstrate that none of these three components: local contrast normalization, SPP fused with center-prior, and vector normalization can be excluded from our proposed approach. They jointly improve image representation and classification performance.

[1]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[2]  Narendra Ahuja,et al.  Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ammad Ali,et al.  Face Recognition with Local Binary Patterns , 2012 .

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Geoffrey E. Hinton Deep belief networks , 2009, Scholarpedia.

[7]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[8]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[9]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[10]  Jae Hyun Lim,et al.  Multiple Kernel Learning with Hierarchical Feature Representations , 2013, ICONIP.

[11]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[14]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[15]  Simon J. Doran,et al.  Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[17]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[18]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[19]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  Yong Luo,et al.  Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification , 2019, IEEE Transactions on Image Processing.

[24]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[25]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[29]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[30]  Chao Zhang,et al.  Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling , 2015, INTERSPEECH.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Xin Zhang,et al.  Single-layer Unsupervised Feature Learning with l2 regularized sparse filtering , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[33]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[34]  Jianzhong Wu,et al.  Stacked Sparse Autoencoder (SSAE) based framework for nuclei patch classification on breast cancer histopathology , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[35]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[36]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[37]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[39]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[41]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Tara N. Sainath,et al.  Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[43]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[45]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[46]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Bin Fang,et al.  Scene classification based on single-layer SAE and SVM , 2015, Expert Syst. Appl..

[48]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[49]  Domenec Puig,et al.  Recognizing Traffic Signs Using a Practical Deep Neural Network , 2015, ROBOT.

[50]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[51]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[52]  Albert Ali Salah,et al.  A Selective Attention-Based Method for Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[54]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[55]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[57]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[59]  Juan Humberto Sossa Azuela,et al.  Self organizing natural scene image retrieval , 2013, Expert Syst. Appl..

[60]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[61]  Elena Mazurova Accuracy of Measurements of Eye-Tracking of a human perception on the screen , 2014 .

[62]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[63]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..