A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

In this paper, we propose a program development toolkit called OMPICUDA for hybrid CPU/GPU clusters. With the support of this toolkit, users can make use of a familiar programming model, i.e., compound OpenMP and MPI instead of mixed CUDA and MPI or SDSM to develop their applications on a hybrid CPU/GPU cluster. In addition, they can adapt the types of resources used for executing different parallel regions in the same program by means of an extended device directive according to the property of each parallel region. On the other hand, this programming toolkit supports a set of data-partition interfaces for users to achieve load balance at the application level no matter what type of resources are used for the execution of their programs.

[1]  Michaël Krajecki,et al.  Source-to-Source Code Translator: OpenMP C to CUDA , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[2]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[3]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[5]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Qing-kui Chen,et al.  A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA , 2009, 2009 First International Conference on Information Science and Engineering.

[7]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[8]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[9]  Philip Heidelberger,et al.  Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.

[10]  Tyng-Yeu Liang,et al.  An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture , 2011, 2011 Third International Conference on Intelligent Networking and Collaborative Systems.

[11]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[12]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[13]  Vivek Sarkar,et al.  JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.

[14]  Mitsuhisa Sato,et al.  The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4 , 2001, WOMPAT.

[15]  Francisco Almeida,et al.  Dynamic Load Balancing on Dedicated Heterogeneous Systems , 2008, PVM/MPI.

[16]  Melvin E. Conway,et al.  Design of a separable transition-diagram compiler , 1963, CACM.

[17]  Michael Klemm,et al.  JCudaMP: OpenMP/Java on CUDA , 2010, IWMSE '10.

[18]  Sun Nian,et al.  Dynamic Load Balancing Algorithm for MPI Parallel Computing , 2009, 2009 International Conference on New Trends in Information and Service Science.

[19]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Baifeng Wu,et al.  Task Scheduling for GPU Heterogeneous Cluster , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[21]  Rudolf Eigenmann,et al.  Towards OpenMP Execution on Software Distributed Shared Memory Systems , 2002, ISHPC.

[22]  Randima Fernando,et al.  The CG Tutorial: The Definitive Guide to Programmable Real-Time Graphics , 2003 .

[23]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[24]  Yu-Wei Chang,et al.  A CUDA programming toolkit on grids , 2012, Int. J. Grid Util. Comput..

[25]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[26]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[27]  Kai Lu,et al.  The TianHe-1A Supercomputer: Its Hardware and Software , 2011, Journal of Computer Science and Technology.

[28]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[29]  Tyng-Yeu Liang,et al.  Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[30]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[31]  Alejandro Duran,et al.  Dynamic load balancing of MPI+OpenMP applications , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..