An improved RBM based on Bayesian Regularization

Restricted Boltzmann Machine is a fundamental method in deep learning networks. Training and generalization is an ill-defined problem in that many different networks may achieve the training goal; however each will respond differently to an unknown input. Traditional approaches include stopping the training early and/or restricting the size of the network These approaches ameliorate the problem of over-fitting where the network learns the patterns presented but is unable to generalize. Bayesian regularization addresses these issues by requiring the weights of the network to attain a minimum magnitude. This ensures that non-contributing weights are reduced significantly and the resulting network represents the essence of the inter-relations of the training. Bayesian Regularization simply introduces an additional term to the objective function. This term comprises the sum of the squares of the weights. The optimization process therefore not only achieves the objective of the original cost (i.e. the minimization of an error metric) but it also ensures that this objective is achieved with minimum-magnitude weights. We have introduced Bayesian Regularization in the training of Restricted Boltzmann Machines and have applied this method in experiments of hand-written numbers classification. Our experiments showed that by adding Bayesian regularization in the training of RBMs, we were able to improve the generalization capabilities of the trained network by reducing its recognition errors by more than 1.6%.

[1]  Andreas C. Müller,et al.  Investigating Convergence of Restricted Boltzmann Machine Learning , 2010 .

[2]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[3]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[4]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[5]  Christian Igel,et al.  Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines , 2010, ICANN.

[6]  Kyunghyun Cho,et al.  Improved Learning Algorithms for Restricted Boltzmann Machines , 2011 .

[7]  Martin T. Hagan,et al.  Gauss-Newton approximation to Bayesian learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[8]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[9]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[10]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[11]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[12]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[14]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[15]  K. Kunisch,et al.  Iterative choices of regularization parameters in linear inverse problems , 1998 .

[16]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[17]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[18]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .