Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models

Directive-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can introduce two problems: an increase in the total cost of ownership and their underutilization because not all codes match their architecture. Remote accelerator virtualization frameworks address those problems. In particular, rCUDA provides transparent access to any graphic processor unit installed in a cluster, reducing the number of accelerators and increasing their utilization ratio. Joining these two technologies, directive-based programming models and rCUDA, is thus highly appealing. In this work, we study the integration of OmpSs and OpenACC with rCUDA, describing and analyzing several applications over three different hardware configurations that include two InfiniBand interconnections and three NVIDIA accelerators. Our evaluation reveals favorable performance results, showing low overhead and similar scaling factors when using remote accelerators instead of local devices.

[1]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[2]  Adrian Jackson,et al.  The EPCC OpenACC Benchmark Suite , 2013 .

[3]  Tetsu Narumi,et al.  Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability , 2012 .

[4]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[5]  Jungwon Kim,et al.  SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.

[6]  Pavan Balaji,et al.  Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA , 2015, 2015 IEEE International Conference on Cluster Computing.

[7]  José Daniel García Sánchez,et al.  Reengineering for parallelism in heterogeneous parallel platforms , 2018, The Journal of Supercomputing.

[8]  Jack Dongarra,et al.  Top500 Supercomputer Sites , 1997 .

[9]  Adrián Castelló,et al.  Exploiting Task-Parallelism on GPU Clusters via OmpSs and rCUDA Virtualization , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[10]  Federico Silla,et al.  On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications , 2014 .

[11]  Tom Davis,et al.  Opengl programming guide: the official guide to learning opengl , 1993 .

[12]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[13]  Wu-chun Feng,et al.  VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[14]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[15]  Carlos Reaño,et al.  Improving the user experience of the rCUDA remote GPU virtualization framework , 2015, Concurr. Comput. Pract. Exp..

[16]  Sergio Iserte,et al.  SLURM Support for Remote GPU Virtualization: Implementation and Performance Study , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[17]  Carlos Reaño,et al.  A complete and efficient CUDA-sharing solution for HPC clusters , 2014, Parallel Comput..

[18]  Antonio José Peña Monferrer Virtualization of accelerators in high performance clusters , 2013 .

[19]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.