### In All Likelihood, Deep Belief Is Not Enough

Statistical models of natural images provide an important tool for researchers in the fields of machine learning and computational neuroscience. The canonical measure to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical models which has recently gained increasing popularity and has been applied to a variety of complex data is formed by deep belief networks. Analyses of these models, however, have often been limited to qualitative analyses based on samples due to the computationally intractable nature of their likelihood. Motivated by these circumstances, the present article introduces a consistent estimator for the likelihood of deep belief networks which is computationally tractable and simple to apply in practice. Using this estimator, we quantitatively investigate a deep belief network for natural image patches and compare its performance to the performance of other models for natural image patches. We find that the deep belief network is outperformed with respect to the likelihood even by very simple mixture models.

[1]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[2]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .

[3]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[4]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[5]  D. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[6]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[7]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Processing.

[8]  G E Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[9]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[10]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[12]  Emile H. L. Aarts,et al.  Boltzmann machines , 1998 .

[13]  Tafsir Thiam The Boltzmann machine , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[14]  Ramesh A. Gopinath,et al.  Gaussianization , 2000, NIPS.

[15]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[16]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[17]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[18]  Geoffrey E. Hinton The ups and downs of Hebb synapses. , 2003 .

[19]  Michael S. Lewicki,et al.  Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units , 2004, NIPS.

[20]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[21]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[22]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[23]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[24]  Amos Storkey,et al.  ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16 , 2004, NIPS 2004.

[25]  Michael S. Lewicki,et al.  A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[26]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[27]  Eero P. Simoncelli,et al.  Statistical Modeling of Images with Fields of Gaussian Scale Mixtures , 2006, NIPS.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[30]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[31]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[32]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[33]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[34]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[35]  Eero P. Simoncelli,et al.  Image denoising using mixtures of Gaussian scale mixtures , 2008, 2008 15th IEEE International Conference on Image Processing.

[36]  Geoffrey E. Hinton,et al.  Generating Facial Expressions with Deep Belief Nets , 2008 .

[37]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[38]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[39]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[40]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[41]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[42]  M. Bethge,et al.  Hierachical Models of Natural Images , 2009 .

[43]  Matthias Bethge,et al.  Hierarchical Modeling of Local Image Features through $L_p$-Nested Symmetric Distributions , 2009, NIPS.

[44]  Ruslan Salakhutdinov,et al.  Learning deep generative models , 2009 .

[45]  Matthias Bethge,et al.  Natural Image Coding in V1: How Much Use Is Orientation Selectivity? , 2008, PLoS Comput. Biol..

[46]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[47]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[48]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[49]  卢耕博,et al.  Bit rate control method and device for image compression , 2010 .

[50]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[51]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[53]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[54]  Rocco A. Servedio,et al.  Restricted Boltzmann Machines are Hard to Approximately Evaluate or Simulate , 2010, ICML.

[55]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[56]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[57]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[58]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.