Modeling Natural Images Using Gated MRFs

This paper describes a Markov Random Field for real-valued image modeling that has two sets of latent variables. One set is used to gate the interactions between all pairs of pixels, while the second set determines the mean intensities of each pixel. This is a powerful model with a conditional distribution over the input that is Gaussian, with both mean and covariance determined by the configuration of latent variables, which is unlike previous models that were restricted to using Gaussians with either a fixed mean or a diagonal covariance matrix. Thanks to the increased flexibility, this gated MRF can generate more realistic samples after training on an unconstrained distribution of high-resolution natural images. Furthermore, the latent variables of the model can be inferred efficiently and can be used as very effective descriptors in recognition tasks. Both generation and discrimination drastically improve as layers of binary latent variables are added to the model, yielding a hierarchical model called a Deep Belief Network.

[1]  G. Young Maximum likelihood estimation and factor analysis , 1941 .

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  N. J. Cohen,et al.  Higher-Order Boltzmann Machines , 1986 .

[4]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[5]  C. J.,et al.  Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .

[6]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[7]  Song-Chun Zhu,et al.  Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[10]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[11]  Yee Whye Teh,et al.  Discovering Multiple Constraints that are Frequently Approximately Satisfied , 2001, UAI.

[12]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[13]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[14]  K. I. WilliamsDivision,et al.  Products of Gaussians and Probabilistic Minor Component Analysis , 2002, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[16]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[17]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[18]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Michael J. Black,et al.  On the unification of line processes, outlier rejection, and robust statistics with applications in early vision , 1996, International Journal of Computer Vision.

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Gwen Littlewort,et al.  Dynamics of Facial Expression Extracted Automatically from Video , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Ian R. Fasel,et al.  A generative framework for real time object detection and classification , 2005, Comput. Vis. Image Underst..

[25]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[28]  Eero P. Simoncelli 4.7 – Statistical Modeling of Photographic Images , 2005 .

[29]  Michael Elad,et al.  Image Denoising Via Learned Dictionaries and Sparse representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[31]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[32]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Karen O. Egiazarian,et al.  Image denoising with block-matching and 3D filtering , 2006, Electronic Imaging.

[34]  S. Vijayakumar,et al.  Proc. Advances in Neural Information Processing Systems (NIPS '06), Vancouver, Canada , 2006 .

[35]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[36]  Max Welling Donald,et al.  Products of Experts , 2007 .

[37]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[38]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[39]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[40]  William T. Freeman,et al.  What makes a good model of natural images? , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Aapo Hyvärinen,et al.  A Two-Layer ICA-Like Model Estimated by Score Matching , 2007, ICANN.

[43]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[44]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[45]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[46]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[47]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[48]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[49]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[50]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[51]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Feature Hierarchies , 2009 .

[53]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[54]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[55]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[58]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[59]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[60]  Yann LeCun,et al.  Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields , 2010, ArXiv.

[61]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[62]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[64]  Qi Gao,et al.  A generative perspective on MRFs in low-level vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[66]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[67]  Michal Irani,et al.  Internal statistics of a single natural image , 2011, CVPR 2011.

[68]  Matthias Bethge,et al.  In All Likelihood, Deep Belief Is Not Enough , 2010, J. Mach. Learn. Res..

[69]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[70]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[71]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.