Multi-feature fusion deep networks

In this paper, we propose a novel deep networks, multi-feature fusion deep networks (MFFDN), based on denoising autoencoder. MFFDN significantly reduces the classification error while giving the interpretability of the hidden-layer feature representation in learning process. Comparing with the traditional denoising autoencoder, MFFDN mainly shows the following advantages: (1) minimally retaining a certain amount of information constrained to a given form about its input; (2) explicitly interpreting the meaning of the feature representation in one hidden layer; (3) enhancing discriminativeness of the whole networks. At last, the experiments analysis on MNIST, CIFAR-10 and SVHN prove the state-of-the-art performance improvement of MFFDN with the advantages minimally retaining information constraint and the interpreted hidden feature representation.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Fuqiang Chen,et al.  Subset based deep learning for RGB-D object recognition , 2015, Neurocomputing.

[6]  Jian Yang,et al.  Locality-Constrained Sparse Auto-Encoder for Image Classification , 2015, IEEE Signal Processing Letters.

[7]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[8]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[9]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[10]  Fahad Shahbaz Khan,et al.  Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[11]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Diego Cabrera,et al.  Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis , 2015, Neurocomputing.

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[17]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Xiangang Li,et al.  A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition , 2013, Neurocomputing.

[20]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Diego Cabrera,et al.  Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals , 2016 .

[25]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[28]  Qionghai Dai,et al.  Local visual feature fusion via maximum margin multimodal deep neural network , 2016, Neurocomputing.