Batch Normalized Deep Boltzmann Machines

Training Deep Boltzmann Machines (DBMs) is a challenging task in deep generative model studies. The careless training usually leads to a divergence or a useless model. We discover that this phenomenon is due to the change of DBM layers’ input signals during model parameter updates, similar to other deterministic deep networks such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). The change of layers’ input distributions not only complicates the learning process but also causes redundant neurons that simply imitate the others’ behaviors. Although this phenomenon can be coped using batch normalization in deep learning, integrating this technique into the probabilistic network of DBMs is a challenging problem since it has to satisfy two conditions of energy function and conditional probabilities. In this paper, we introduce Batch Normalized Deep Boltzmann Machines (BNDBMs) that meet both aforementioned conditions and successfully combine batch normalization and DBMs into the same framework. However, unlike CNNs, due to the probabilistic nature of DBMs, training DBMs with batch normalization has some differences: i) fixing shift parameters β but learning scale parameters γ; ii) avoiding normalizing the first hidden layer and iii) maintaining multiple pairs of population means and variances per neuron rather than one pair in CNNs. We observe that our proposed BNDBMs can stabilize the input signals of network layers and facilitate the training process as well as improve the model quality. More interestingly, BNDBMs can be trained successfully without pretraining, which is usually a mandatory step in most existing DBMs. The experimental results in MNIST, Fashion-MNIST and Caltech 101 Silhouette datasets show that our BNDBMs outperform DBMs and centered DBMs in terms of feature representation and classification accuracy (3.98% and 5.84% average improvement for pretraining and no pretraining respectively).

[1]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[2]  Laurenz Wiskott,et al.  How to Center Deep Boltzmann Machines , 2016, J. Mach. Learn. Res..

[3]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[4]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines as Feed-Forward Hierarchies , 2012, AISTATS.

[5]  Mohammad Norouzi,et al.  Stacks of convolutional Restricted Boltzmann Machines for shift-invariant feature learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[7]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[8]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[9]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[15]  Haibo Wang,et al.  Contractive Slab and Spike Convolutional Deep Boltzmann Machine , 2018, Neurocomputing.

[16]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[17]  Richa Singh,et al.  Class sparsity signature based Restricted Boltzmann Machine , 2017, Pattern Recognit..