Accelerating S3D: A GPGPU Case Study

The graphics processor (GPU) has evolved into an appealing choice for high performance computing due to its superior memory bandwidth, raw processing power, and flexible programmability. As such, GPUs represent an excellent platform for accelerating scientific applications. This paper explores a methodology for identifying applications which present significant potential for acceleration. In particular, this work focuses on experiences from accelerating S3D, a high-fidelity turbulent reacting flow solver. The acceleration process is examined from a holistic viewpoint, and includes details that arise from different phases of the conversion. This paper also addresses the issue of floating point accuracy and precision on the GPU, a topic of immense importance to scientific computing. Several performance experiments are conducted, and results are presented from the NVIDIA Tesla C1060 GPU. We generalize from our experiences to provide a roadmap for deploying existing scientific applications on heterogeneous GPU platforms.

[1]  Bingsheng He,et al.  Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[2]  N. Fujimoto,et al.  Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  Klaus Mueller,et al.  Practical considerations for GPU-accelerated CT , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[4]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[5]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[6]  Rüdiger Westermann,et al.  Acceleration techniques for GPU-based volume rendering , 2003, IEEE Visualization, 2003. VIS 2003..

[7]  John Mellor-Crummey Harnessing the power of emerging petascale platforms , 2007 .

[8]  Klaus Schulten,et al.  GPU acceleration of cutoff pair potentials for molecular modeling applications , 2008, CF '08.

[9]  Rafael Mayo,et al.  Evaluation and tuning of the Level 3 CUBLAS for graphics processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Jacqueline H. Chen,et al.  Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models , 2005 .

[11]  Becky Verastegui,et al.  Proceedings of the 2007 ACM/IEEE conference on Supercomputing , 2007, HiPC 2007.

[12]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  G. Cummins,et al.  Scientific computation through a GPU , 2008, IEEE SoutheastCon 2008.

[14]  Allen D. Malony,et al.  Portable profiling and tracing for parallel, scientific applications using C++ , 1998, SPDT '98.

[15]  T. Poinsot Boundary conditions for direct simulations of compressible viscous flows , 1992 .