The Optimization of Parallel DBN Based on Spark

Deep Belief Network (DBN) is widely used for modelling and analysis of all kinds of actual problems. However, it’s easy to have a computational bottleneck problem when training DBN in a single computational node. And traditional parallel full-batch gradient descent exists the problem that the speed of convergence is slow when we use it to train DBN. To solve this problem, the article proposes a parallel mini-batch gradient descent algorithm based on Spark and uses it to train DBN. The experiment shows the method is faster than parallel full-batch gradient and the convergence result is better when batch size is relatively small. We use the method to train the DBN, and apply it to text classification. We also discuss how the size of batch impacts on the weights of network. The experiments show that it can improve the precision and recall of text classification compared with SVM when batch size is small.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[6]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Marco Zorzi,et al.  Parallelization of Deep Networks , 2012, ESANN.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Tara N. Sainath,et al.  Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[13]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[14]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[15]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.