论文信息 - Porting to the Intel Xeon Phi: Opportunities and Challenges

Porting to the Intel Xeon Phi: Opportunities and Challenges

This work describes the challenges presented by porting code to the Intel Xeon Phi coprocessor, as well as opportunities for optimization and tuning. We use micro-benchmarks, code segments, assembly listings and application level results to illustrate the key issues in porting to the Xeon Phi coprocessor, always keeping in mind both portability and performance. While executing code on the Xeon Phi in native mode is fairly straightforward it can be a challenge to achieve good performance. The complexity of optimization increases as one introduces offload, distributed offload, or symmetric execution modes. We will initially focus on the fundamental issues that can prevent acceptable performance in native execution, and then address the key issues in data transfers due to either offloaded regions or MPI exchanges with the host CPU. Some of the issues are of a generic nature and affect any code using heterogeneous execution - PCIe bandwidth bottleneck -, and others are specific to the Xeon Phi and its software environment - Host/MIC MPI exchanges. We will also make an effort to indicate which issues are specific to this platform and which are of general applicability. In particular we will draw comparisons between the data management models in the Intel Xeon Phi and in the NVIDIA CUDA environment.

C. Rosales | C. Rosales

[1] A. Gupta,et al. Evaluation of Rodinia Codes on Intel Xeon Phi , 2013, 2013 4th International Conference on Intelligent Systems, Modelling and Simulation.

[2] Roy H. Stogner,et al. Early Experiences Porting Scientific Applications to the Many Integrated Core ( MIC ) Platform , 2012 .

[3] J. Boon. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .

[4] Cheng Chen,et al. Accelerating PQMRCGSTAB Algorithm on Xeon Phi , 2013 .

[5] Ümit V. Çatalyürek,et al. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.

[6] Hongwei Zheng,et al. A lattice Boltzmann model for multiphase flows with large density ratio , 2006, J. Comput. Phys..

[7] Stephen A. Jarvis,et al. Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[8] Theo G. Theofanous,et al. The lattice Boltzmann equation method: theoretical interpretation, numerics and implications , 2003 .

[9] Dhabaleswar K. Panda,et al. Efficient Intra-node Communication on Intel-MIC Clusters , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.