论文信息 - Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

The Intel® Xeon Phi coprocessor platform has a new software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-capable Intel® Architecture CPU, namely, the Intel® Xeon Phi coprocessor. The purpose of that offload is to improve response time and/or throughput. This paper presents the compiler offload software runtime infrastructure for the Intel® Xeon Phi coprocessor, which includes a production C/C++ and Fortran compiler that enables offload to that coprocessor, and an underlying Intel® Many Integrated Core (Intel® MIC) platform software stack that enables offloading. The paper shares the insights that grow out of the experience of a multi-year, intensive development effort. It addresses end users' questions about offload with the compiler offload runtime, namely, why offload to a co-processor is useful, how it is specified, and what the conditions for the profitability of offload are. It also serves as a guide to potential third-party developers of offload runtimes, such as a gcc-based offload compiler, ports of existing commercial offloading compilers to Intel® Xeon Phi coprocessor such as CAPS®, and third-party offload library vendors that Intel is working with, such as NAG® and MAGMA®. It describes the software architecture and design of the offload compiler runtime. It enumerates the key performance features for this heterogeneous computing stack, related to initializa-tion, data movement and invocation. Finally, it evaluates the performance impact of those features for a set of directed micro-benchmarks and larger workloads.

[1] Yi Yang,et al. Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors , 2012, ICS '12.

[2] Soonhoi Ha,et al. Dynamic Code Overlay of SDF-Modeled Programs on Low-end Embedded Systems , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[3] Avi Mendelson,et al. Programming model for a heterogeneous x86 platform , 2009, PLDI '09.

[4] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[5] Andrew Richards,et al. Automatic Offloading of C++ for the Cell BE Processor: A Case Study Using Offload , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[6] Anthony Skjellum,et al. Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[7] S. Schwartz,et al. Properties of the working-set model , 1972, OPSR.

[8] Jim Jeffers,et al. Chapter 10 – Linux on the Coprocessor , 2013 .

[9] Anthony Skjellum,et al. Using MPI: portable parallel programming with the message-passing interface, 2nd Edition , 1999, Scientific and engineering computation series.

[10] Georg Hager,et al. Hybrid MPI and OpenMP Parallel Programming , 2006, PVM/MPI.

[11] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[12] 장훈,et al. [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[13] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.