On Construction of a Caffe Deep Learning Framework based on Intel Xeon Phi

With the increase of processor computing power, also a substantial rise in the development of many scientific applications, such as weather forecast, financial market analysis, medical technology and so on. The need for more intelligent data increases significantly. Deep Learning as a framework that able to understand the abstract information such as images, text, and sound has a challenging area in recent research works. This phenomenon makes the accuracy and speed are essential for implementing a large neural network. Therefore in this paper, we intend to implement Caffe deep learning framework on Intel Xeon Phi and measure the performance of this environment. In this case, we conduct three experiments. First, we evaluated the accuracy of Caffe deep learning framework in several numbers of iterations on Intel Xeon Phi. For the speed evaluation, in the second experiment we compared the training time before and after optimization on Intel Xeon E5-2650 and Intel Xeon Phi 7210 . In this case, we use vectorization, OpenMP parallel processing, message transfer Interface (MPI) for optimization. In the third experiment, we compared multinode execution results on two nodes of Intel Xeon E5-2650 and two nodes of Intel Xeon Phi 7210.

[1]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[2]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[3]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[6]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Ákos Zarándy,et al.  CNN-based models for color vision and visual illusions , 1999 .

[9]  Gopalakrishna Hegde,et al.  CaffePresso: An optimized library for Deep Learning on embedded accelerator-based platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[10]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Jack J. Dongarra,et al.  Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.

[12]  Alexander Heinecke Accelerators in scientific computing is it worth the effort? , 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS).

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[15]  Tamás Roska,et al.  The use of CNN models in the subcortical visual pathway. (Reseach report of the Dual and Neural Computing Systems Laboratory DNS-16-1992) , 1993 .

[16]  J. Demmel,et al.  Sun Microsystems , 1996 .