Deep hybrid networks with good out-of-sample object recognition

We introduce Deep Hybrid Networks that are robust to the recognition of out-of-sample objects, i.e., ones that are drawn from a different probability distribution from the training data distribution. The networks are based on a particular combination of an auto-encoder and stacked Restricted Boltzmann Machines (RBMs). The autoencoder is used to extract sparse features, which are expected to be noise invariant in the observations. The stacked RBMs then observe the sparse features as inputs to learn the top hierarchical features. The use of RBMs is motivated by the fact that the stacked RBMs typically provide good performance when dealing with in-sample observations, as proven in the previous works. To improve the robustness against local noise, we propose a variant of our hybrid network by the usage of a mixture of sparse features and sparse connections in the auto-encoder layer. The experiments show that our proposed deep networks provide good performance in both the in-sample and out-of-sample situations, particularly when the number of training examples is small.

[1]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[2]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[3]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[5]  Dong-Chen He,et al.  Texture Unit, Texture Spectrum, And Texture Analysis , 1990 .

[6]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[7]  Chris Eliasmith,et al.  Deep networks for robust visual recognition , 2010, ICML.

[8]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[9]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[15]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[18]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[19]  Chin-Hui Lee,et al.  An mcmc approach to joint estimation of clean speech and noise for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[21]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[22]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[23]  Brendt Wohlberg,et al.  Task-driven dictionary learning for inpainting , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[25]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[26]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[27]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..