Marginal Deep Architecture: Stacking Feature Learning Modules to Build Deep Learning Models

Recently, many deep models have been proposed in different fields, such as image classification, object detection, and speech recognition. However, most of these architectures require a large amount of training data and employ random initialization. In this paper, we propose to stack feature learning modules for the design of deep architectures. Specifically, marginal Fisher analysis (MFA) is stacked layer-by-layer for the initialization and we call the constructed deep architecture marginal deep architecture (MDA). When implementing the MDA, the weight matrices of MFA are updated layer-by-layer, which is a supervised pre-training method and does not need a large scale of data. In addition, several deep learning techniques are applied to this architecture, such as backpropagation, dropout, and denoising, to fine-tune the model. We have compared MDA with some feature learning and deep learning models on several practical applications, such as handwritten digits recognition, speech recognition, historical document understanding, and action recognition. The extensive experiments show that the performance of MDA is better than not only shallow feature learning models but also related deep learning models in these tasks.

[1]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[2]  Mohamed Cheriet,et al.  An Empirical Evaluation of Supervised Dimensionality Reduction for Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Junyu Dong,et al.  Visual Texture Perception with Feature Learning Models and Deep Architectures , 2014, CCPR.

[5]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[6]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[7]  Wu-Jun Li,et al.  Gaussian Process Latent Random Field , 2010, AAAI.

[8]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[11]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[12]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[13]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[16]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[17]  L. J. P. van der Maaten,et al.  An Introduction to Dimensionality Reduction Using Matlab , 2007 .

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[20]  Guoqiang Zhong,et al.  From shallow feature learning to deep learning: Benefits from the width and depth of deep architectures , 2019, WIREs Data Mining Knowl. Discov..

[21]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[22]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[23]  Junyu Dong,et al.  Stretching deep architectures for text recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[24]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[25]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[29]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[30]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[31]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[32]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.