Hyperparameters Adaptation for Restricted Boltzmann Machines Based on Free Energy

Restricted Boltzmann Machine (RBM), the building block of Deep Belief Network (DBN) and Deep Boltzmann Machine (DBM), is one of the most powerful unsupervised feature detectors. Despite its success, the challenging issue of setting its hyperparameters remains. In recent years, various types of the hyperparameters optimization (HO) algorithms have been proposed and substantially improved the performance in many supervised learning models. However, they cannot directly apply to the RBM due to its unsupervised learning strategy. Moreover, these HO algorithms typically have to train the models fully or partially for several iterations before the hyperparameters can been assess. That causes computational overhead very high, especially for the deep architectures. This paper proposes a new efficient procedure, which can online estimate the hyperparameters when training the stacked RBMs. Specifically, we optimize the three main hyperparameters (learning rate, momentum, weight-cost) simultaneously based on the free energy of the RBM by using Gaussian Process in each epoch. Extensive experiments demonstrate that the new procedure improves the performance of the RBMs significantly, and is superior to the state-of-the-art of hyperparameter optimization algorithms when training the stacked RBMs.

[1]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[2]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[3]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[4]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[5]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[6]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[7]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[8]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[9]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[10]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[11]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[12]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[15]  François Laviolette,et al.  Sequential Model-Based Ensemble Optimization , 2014, UAI.

[16]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[17]  F. Hutter,et al.  Towards efficient Bayesian Optimization for Big Data , 2015 .

[18]  Marc'Aurelio Ranzato,et al.  A Unified Energy-Based Framework for Unsupervised Learning , 2007, AISTATS.

[19]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[20]  Ryan P. Adams,et al.  Training Restricted Boltzmann Machines on Word Observations , 2012, ICML.

[21]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[22]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[23]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.