论文信息 - Influence of InfiniBand FDR on the performance of remote GPU virtualization

Influence of InfiniBand FDR on the performance of remote GPU virtualization

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This approach, usually referred to as remote GPU virtualization, aims at reducing the number of GPUs present in a cluster, while increasing their utilization rate. The performance of the interconnection network is key to achieving reasonable performance results by means of remote GPU virtualization. To this end, several networking technologies with throughput comparable to that of PCI Express have appeared recently. In this paper we analyze the influence of InfiniBand FDR on the performance of remote GPU virtualization, comparing its impact on a variety of GPU-accelerated applications with other networking technologies, such as Infini-Band QDR and Gigabit Ethernet. Given the severe limitations of freely available remote GPU virtualization solutions, the rCUDA framework is used as the case study for this analysis. Results show that the new FDR interconnect, featuring higher bandwidth than its predecessors, allows the reduction of the overhead of using GPUs remotely, thus making this approach even more appealing.

[1] Amnon Barak,et al. A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[2] Ramani Duraiswami,et al. Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3] Roger L. Davis,et al. Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units , 2009 .

[4] Federico Silla,et al. An Efficient Implementation of GPU Virtualization in High Performance Clusters , 2009, Euro-Par Workshops.

[5] Sergei Gorlatch,et al. dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6] Lin Shi,et al. vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.

[8] Yongchao Liu,et al. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[9] Antonio José Peña Monferrer. Virtualization of accelerators in high performance clusters , 2013 .

[10] Nikolaos V. Sahinidis,et al. GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[11] Federico Silla,et al. Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.

[12] Enrique S. Quintana-Ortí,et al. Exploiting the capabilities of modern GPUs for dense matrix computations , 2009, Concurr. Comput. Pract. Exp..

[13] John E. Stone,et al. Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters , 2010, International Conference on Green Computing.

[14] Yu-Wei Chang,et al. GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[15] Renato J. O. Figueiredo,et al. Guest Editors' Introduction: Resource Virtualization Renaissance , 2005, Computer.

[16] Kenneth A. Hawick,et al. Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA , 2009, PDPTA.

[17] Ioane Muni Toke,et al. GPU based sparse grid technique for solving multidimensional options pricing PDEs , 2009, WHPCF '09.

[18] Wu-chun Feng,et al. VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[19] Vanish Talwar,et al. GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[20] Giulio Giunta,et al. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[21] Tetsu Narumi,et al. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.