论文信息 - Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Abstract Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.

[1] José Ranilla,et al. Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA , 2011, The Journal of Supercomputing.

[2] Chao-Tung Yang,et al. Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters , 2011, Concurr. Comput. Pract. Exp..

[3] Robert Strzodka,et al. Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[4] R. Dolbeau,et al. HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[5] Dave Shreiner,et al. OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2.1 , 2007 .

[6] François Bodin,et al. Heterogeneous multicore parallel programming for graphics processing units , 2009, Sci. Program..

[7] Hesham El-Rewini,et al. Message Passing Interface (MPI) , 2005 .

[8] Lien Fu Lai,et al. Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters , 2009, The Journal of Supercomputing.

[9] Shenjian Chen,et al. Message Passing Interface (MPI) , 2011, Encyclopedia of Parallel Computing.

[10] Tom Davis,et al. Opengl programming guide: the official guide to learning opengl , 1993 .