Deep Feature Representation via Multiple Stack Auto-Encoders

Recently, deep architectures, such as stack auto-encoders (SAEs), have been used to learn features from the unlabeled data. However, it is difficult to get the multi-level visual information from the traditional deep architectures (such as SAEs). In this paper, a feature representation method which concatenates Multiple Different Stack Auto-Encoders (MDSAEs) is presented. The proposed method tries to imitate the human visual cortex to recognize the objects from different views. The output of the last hidden layer for each SAE can be regarded as a kind of feature. Several kinds of features are concatenated together to form a final representation according to their weights (The output of deep architectures are assigned a high weight, and vice versa). From this way, the hierarchical structure of the human brain cortex can be simulated. Experimental results on datasets MNIST and CIRFA10 for classification have demonstrated the superior performance.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Yuchun Lee,et al.  Handwritten Digit Recognition Using K Nearest-Neighbor, Radial-Basis Function, and Backpropagation Neural Networks , 1991, Neural Computation.

[3]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[7]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[9]  Hyun Ah Song,et al.  Hierarchical Representation Using NMF , 2013, ICONIP.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[13]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Gang Wang,et al.  Recognizing trees at a distance with discriminative deep feature learning , 2013, 2013 9th International Conference on Information, Communications & Signal Processing.

[16]  Mario Fritz,et al.  Learning Smooth Pooling Regions for Visual Recognition , 2013, BMVC.

[17]  Navdeep Jaitly,et al.  Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[18]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[19]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[20]  H. T. Kung,et al.  Stable and Efficient Representation Learning with Nonnegativity Constraints , 2014, ICML.

[21]  Thomas Martinetz,et al.  Simple Method for High-Performance Digit Recognition Based on Sparse Coding , 2008, IEEE Transactions on Neural Networks.

[22]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[24]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[27]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.