Gaussian-Bernoulli RBMs Without Tears

We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations. We propose a novel Gibbs-Langevin sampling algorithm that outperforms existing methods like Gibbs sampling. We propose a modified contrastive divergence (CD) algorithm so that one can generate images with GRBMs starting from noise. This enables direct comparison of GRBMs with deep generative models, improving evaluation protocols in the RBM literature. Moreover, we show that modified CD and gradient clipping are enough to robustly train GRBMs with large learning rates, thus removing the necessity of various tricks in the literature. Experiments on Gaussian Mixtures, MNIST, FashionMNIST, and CelebA show GRBMs can generate good samples, despite their single-hidden-layer architecture. Our code is released at: \url{https://github.com/lrjconan/GRBM}.

[1]  Feng Zhou,et al.  Approximation properties of Gaussian-binary restricted Boltzmann machines and Gaussian-binary deep belief networks , 2022, Neural Networks.

[2]  Vidyadhar Upadhya,et al.  Learning Gaussian–Bernoulli RBMs Using Difference of Convex Functions Optimization , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[4]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[5]  David P. Wipf,et al.  Diagnosing and Enhancing VAE Models , 2019, ICLR.

[6]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[7]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[8]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[9]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[10]  Muneki Yasuda,et al.  Mean-Field Inference in Gaussian Restricted Boltzmann Machine , 2015, 1512.00927.

[11]  Florent Krzakala,et al.  Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy , 2015, NIPS 2015.

[12]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Laurenz Wiskott,et al.  Gaussian-binary restricted Boltzmann machines for modeling natural image statistics , 2014, PloS one.

[14]  Tapani Raiko,et al.  Gaussian-Bernoulli deep Boltzmann machine , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[15]  Oswin Krause,et al.  Approximation properties of DBNs with binary hidden units and real-valued visible units , 2013, ICML.

[16]  Tapani Raiko,et al.  Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[17]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[18]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.

[19]  Matthias Bethge,et al.  In All Likelihood, Deep Belief Is Not Enough , 2010, J. Mach. Learn. Res..

[20]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[21]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[23]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[24]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[25]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[28]  David J. Earl,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[29]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[30]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[31]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[32]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[33]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[34]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[35]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[36]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[37]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[38]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Nan Wang,et al.  An analysis of Gaussian-binary restricted Boltzmann machines for natural images , 2012, ESANN.

[40]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[43]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..