Tikhonov-Type Regularization for Restricted Boltzmann Machines

In this paper, we study a Tikhonov-type regularization for restricted Boltzmann machines (RBM). We present two alternative formulations of the Tikhonov-type regularization which encourage an RBM to learn a smoother probability distribution. Both formulations turn out to be combinations of the widely used weight-decay and sparsity regularization. We empirically evaluate the effect of the proposed regularization schemes and show that the use of them could help extracting better discriminative features with sparser hidden activation probabilities.

[1]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[4]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[5]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[6]  Ulrike von Luxburg,et al.  Proceedings of the 28th International Conference on Machine Learning, ICML 2011 , 2011, International Conference on Machine Learning.

[7]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[8]  Tapani Raiko,et al.  Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[9]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[10]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[11]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[12]  Sang Uk Lee,et al.  Three-dimensional oil painting reconstruction with stroke based rendering , 2007, The Visual Computer.

[13]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[17]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[18]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[19]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[20]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[21]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[22]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[23]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[24]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[25]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[26]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.

[27]  Pascal Vincent,et al.  Parallel Tempering for Training of Restricted Boltzmann Machines , 2010 .

[28]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.