Comparative benchmarking: matrix multiplication on a multicore coprocessor and a GPU

This paper reports the performances of an Intel Xeon Phi coprocessor and an Nvidia Tesla GPU for multiplication of large matrices. For this purpose, various libraries, such as Intel MKL and MAGMA, are employed with different execution modes of the coprocessor. We compare the performances of the coprocessor and the GPU in terms of running time, memory requirement, and programming difficulty for the special case of matrix-matrix multiplication.

[1]  Rong Gu,et al.  Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[2]  Xiaoling Yang,et al.  Novel hardware acceleration techniques for finite difference time domain methods , 2014, 2014 International Conference on Electromagnetics in Advanced Applications (ICEAA).

[3]  Taku Itoh,et al.  Speedup of Iterative Solver for Electromagnetic Analysis Using Many Integrated Core Architecture , 2015, IEEE Transactions on Magnetics.

[4]  Nitin Rai,et al.  Optimization of Molecular Dynamics application for Intel Xeon Phi coprocessor , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[5]  Weiguo Liu,et al.  XSW: Accelerating Biological Database Search on Xeon Phi , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[6]  Sonia Rani,et al.  Parallelization of FDM/FEM computation for PDEs on PARAM YUVA-II cluster of Xeon Phi coprocessors , 2014, 2014 Annual IEEE India Conference (INDICON).

[7]  Krzysztof Banas,et al.  Finite element numerical integration on Xeon Phi coprocessor , 2014, 2014 Federated Conference on Computer Science and Information Systems.

[8]  Ozgur Ergul,et al.  The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetics Problems: Ergul/The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetics Problems , 2014 .

[9]  Yongchao Liu,et al.  SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).