论文信息 - Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

One of the difficulties for current GPGPU (General-Purpose computing on Graphics Processing Units) users is writing code to use multiple GPUs. One limiting factor is that only a few GPUs can be attached to a PC, which means that MPI (Message Passing Interface) would be a common tool to use tens or more GPUs. However, an MPIbased parallel code is sometimes complicated compared with a serial one. In this paper, we propose DS-CUDA (DistributedShared Compute Unified Device Architecture), a middleware to simplify the development of code that uses multiple GPUs distributed on a network. DS-CUDA provides a global view of GPUs at the source-code level. It virtualizes a cluster of GPU equipped PCs to seem like a single PC with many GPUs. Also, it provides automated redundant calculation mechanism to enhance the reliability of GPUs. The performance of Monte Carlo and many-body simulations are measured on 22-node (64-GPU) fraction of the TSUBAME 2.0 supercomputer. The results indicate that DS-CUDA is a practical solution to use tens or more GPUs. Keywords-GPGPU; CUDA; distributed shared system; virtualization.

Tetsu Narumi | Kenji Yasuoka | Atsushi Kawai | Kazuyuki Yoshikawa

[1] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[2] Makoto Taiji,et al. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3] Fumiyoshi Shoji,et al. The K computer: Japanese next-generation supercomputer development project , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[4] Vanish Talwar,et al. GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[5] Giulio Giunta,et al. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[6] Kenli Li,et al. vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[7] Amnon Barak,et al. A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[8] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[9] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[10] M. P. Tosi,et al. Ionic sizes and born repulsive parameters in the NaCl-type alkali halides—II: The generalized Huggins-Mayer form☆ , 1964 .

[11] R W Hockney,et al. Computer Simulation Using Particles , 1966 .

[12] Federico Silla,et al. Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.