CrystalGPU: Transparent and Efficient Utilization of GPU Power

General-purpose computing on graphics processing units (GPGPU) has recently gained considerable attention in various domains such as bioinformatics, databases and distributed computing. GPGPU is based on using the GPU as a co-processor accelerator to offload computationally-intensive tasks from the CPU. This study starts from the observation that a number of GPU features (such as overlapping communication and computation, short lived buffer reuse, and harnessing multi-GPU systems) can be abstracted and reused across different GPGPU applications. This paper describes CrystalGPU, a modular framework that transparently enables applications to exploit a number of GPU optimizations. Our evaluation shows that CrystalGPU enables up to 16x speedup gains on synthetic benchmarks, while introducing negligible latency overhead.

[1]  Matei Ripeanu,et al.  StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.

[2]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[3]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[4]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[5]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[6]  Neelam Goyal,et al.  Signature Matching in Network Processing using SIMD / GPU Architectures , 2007 .

[7]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[8]  Vijay S. Pande,et al.  Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problem , 2009, 0901.0866.

[9]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[10]  Gang Peng,et al.  Multi-dimensional storage virtualization , 2004, SIGMETRICS '04/Performance '04.

[11]  Michael D. McCool,et al.  Metaprogramming GPUs with Sh , 2004 .

[12]  Mark Oskin,et al.  Using modern graphics architectures for general-purpose computing: a framework and analysis , 2002, MICRO 35.

[13]  Jonas Tölke,et al.  Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA , 2009, Comput. Vis. Sci..

[14]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.