Efficient Multi-training Framework of Image Deep Learning on GPU Cluster

In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.

[1]  Tara N. Sainath,et al.  Parallel Deep Neural Network Training for Big Data on Blue Gene/Q , 2017, IEEE Trans. Parallel Distributed Syst..

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[4]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[5]  Gwo Giun Lee,et al.  Quantifying Intrinsic Parallelism Using Linear Algebra for Algorithm/Architecture Coexploration , 2012, IEEE Transactions on Parallel and Distributed Systems.

[6]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[10]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[14]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[16]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Marc'Aurelio Ranzato,et al.  Multi-GPU Training of ConvNets , 2013, ICLR.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.