Multiresolution Deep Belief Networks

Motivated by the observation that coarse and ne resolutions of an image reveal dierent structures in the underlying visual phenomenon, we present a model based on the Deep Belief Network (DBN) which learns features from the multiscale representation of images. A Laplacian Pyramid is rst constructed for each image. DBNs are then trained separately at each level of the pyramid. Finally, a top level RBM combines these DBNs into a single network we call the Multiresolution Deep Belief Network (MrDBN). Experiments show that MrDBNs generalize better than standard DBNs on NORB classication and TIMIT phone recognition. In the domain of generative learning, we demonstrate the superiority of MrDBNs at modeling face images.

[1]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  W. Marsden I and J , 2012 .

[3]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[7]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[8]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[9]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[10]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[11]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[12]  Wei Zhang,et al.  Real-time Accurate Object Detection using Multiple Resolutions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[14]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[15]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[16]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[17]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[19]  Pawan Sinha,et al.  Face Recognition by Humans: Nineteen Results All Computer Vision Researchers Should Know About , 2006, Proceedings of the IEEE.

[20]  Yali Amit,et al.  A coarse-to-fine strategy for multiclass shape detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[23]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[24]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[26]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[27]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[28]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[29]  Timothy F. Cootes,et al.  Active Shape Models: Evaluation of a Multi-Resolution Method for Improving Image Search , 1994, BMVC.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[32]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..