Restricted Boltzmann Machines and Deep Belief Networks on multi-core processors

Deep learning architecture models by contrast with shallow models draw on the insights of biological inspiration which has been a challenge since the inception of the idea of simulating the brain. In particular their (many) hierarchical levels of composition track the development of parallel implementation in an attempt to become accessibly fast. When it comes to performance enhancement Graphics Processing Units (GPU) have carved their own strength in machine learning. In this paper, we present an approach that relies mainly on three kernels for implementing both the Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) algorithms. Instead of considering the neuron as the smallest unit of computation each thread represents the connection between two (one visible and one hidden) neurons. Although conceptually it may seem weird, the rationale behind is to think of a connection as performing a simple function that multiplies the clamped input by its weight. Thus, we maximize the GPU workload avoiding idle cores. Moreover, we placed great emphasis on the kernels to avoid uncoalesced memory accesses as well as to take advantage of the shared memory to reduce global memory accesses. Additionally, our approach uses a step adaptive learning rate procedure which accelerates convergence. The approach yields very good speedups (up to 46×) as compared with a straightforward implementation when both GPU and CPU implementations are tested on the MINST database.

[1]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[2]  Noel Lopes,et al.  GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units , 2010, 2010 10th International Conference on Hybrid Intelligent Systems.

[3]  Russell Beale,et al.  Handbook of Neural Computation , 1996 .

[4]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[5]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[6]  B. Ribeiro,et al.  GPUMLib : An Efficient Open-Source GPU Machine Learning Library , 2011 .

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[9]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[10]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[11]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[12]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[15]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.