论文信息 - Multi-feature fusion deep networks

Multi-feature fusion deep networks

In this paper, we propose a novel deep networks, multi-feature fusion deep networks (MFFDN), based on denoising autoencoder. MFFDN significantly reduces the classification error while giving the interpretability of the hidden-layer feature representation in learning process. Comparing with the traditional denoising autoencoder, MFFDN mainly shows the following advantages: (1) minimally retaining a certain amount of information constrained to a given form about its input; (2) explicitly interpreting the meaning of the feature representation in one hidden layer; (3) enhancing discriminativeness of the whole networks. At last, the experiments analysis on MNIST, CIFAR-10 and SVHN prove the state-of-the-art performance improvement of MFFDN with the advantages minimally retaining information constraint and the interpreted hidden feature representation.

Gang Ma | Bo Zhang | Xi Yang | Zhongzhi Shi

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[3] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Fuqiang Chen,et al. Subset based deep learning for RGB-D object recognition , 2015, Neurocomputing.

[6] Jian Yang,et al. Locality-Constrained Sparse Auto-Encoder for Image Classification , 2015, IEEE Signal Processing Letters.

[7] Matthew H Tong,et al. SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[8] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[9] Nathalie Japkowicz,et al. Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[10] Fahad Shahbaz Khan,et al. Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[11] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Diego Cabrera,et al. Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis , 2015, Neurocomputing.

[14] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[17] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[18] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19] Xiangang Li,et al. A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition , 2013, Neurocomputing.

[20] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24] Diego Cabrera,et al. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals , 2016 .

[25] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[26] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[28] Qionghai Dai,et al. Local visual feature fusion via maximum margin multimodal deep neural network , 2016, Neurocomputing.