Accelerating Physical Simulations Using Graphics Processing Units

Abstract Graphics processors are used in many fields of applications that require high computational power. Especially in scientific computing, the programming of graphics processing units is an active field of research. Because of their hardware characteristics, graphics processors are well-suited for regular parallelism, however the implementation of irregular problems requires more advanced strategies. In this article, the hardware architecture of graphics processors and different frameworks for graphics processor programming, such as CAL, Brook+, CUDA and OpenCL with their specific properties, are presented. Additionally, an overview of different physical applications that have been implemented successfully on graphics processors is given. The parallel implementation of a specific irregular physical application on graphics processors is presented in more detail. This application simulates anomalous diffusion in porous media using random walk on Random Sierpinski Carpets. Zusammenfassung Grafikprozessoren werden in vielen Anwendungsbereichen, in denen es auf hohe Rechenleistung ankommt, genutzt. Auch im Wissenschaftlichen Rechnen sind parallele Implementierungen auf Grafikprozessoren Gegenstand aktueller Forschung. Obwohl Grafikprozessoren besonders für reguläre Parallelität geeignet sind, werden zunehmend auch irreguläre Anwendungsprobleme betrachtet. Dieser Artikel stellt die Eigenschaften von Grafikprozessoren und verschiedene Frameworks zur Grafikprozessorprogrammierung wie CAL, Brook+, CUDA und OpenCL mit ihren spezifischen Eigenheiten vor. Weiter gibt der Artikel einen Überblick über verschiedene physikalische Anwendungen, die bereits erfolgreich auf Grafikprozessoren implementiert wurden. Die parallele Implementierung einer speziellen irregulären physikalischen Anwendung auf Grafikprozessoren wird detaillierter vorgestellt. Diese simuliert anomale Diffusion in porösen Materialien durch Random Walk auf zufälligen Sierpinski-Teppichen.

[1]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[2]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[3]  Christopher R. Anderson,et al.  An Implementation of the Fast Multipole Method without Multipoles , 1992, SIAM J. Sci. Comput..

[4]  Junichiro Makino Yet Another Fast Multipole Method without Multipoles-Pseudoparticle Multipole Method , 1999 .

[5]  Karl-Heinz Hoffmann,et al.  Task Pool Teams Implementation of the Master Equation Approach for Random Sierpinski Carpets , 2006, Euro-Par.

[6]  Richard W. Vuduc,et al.  Direct N-body Kernels for Multicore Platforms , 2009, 2009 International Conference on Parallel Processing.

[7]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[8]  Stanimire Tomov,et al.  Benchmarking and implementation of probability-based simulations on programmable graphics cards , 2003, Comput. Graph..

[9]  Wolfgang Paul,et al.  GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..

[10]  Jack J. Dongarra,et al.  Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..

[11]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[12]  Peter Bailey,et al.  Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors , 2009, 2009 International Conference on Parallel Processing.

[13]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[14]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[15]  Karl-Heinz Hoffmann,et al.  Simulating anomalous diffusion on graphics processing units , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[16]  Jack Dongarra,et al.  Implementation of mixed precision in solving systems of linear equations on the Cell processor: Research Articles , 2007 .

[17]  Karl Heinz Hoffmann,et al.  Modelling porous structures by repeated Sierpinski carpets , 2001 .

[18]  Arie E. Kaufman,et al.  Implementing lattice Boltzmann computation on graphics hardware , 2003, The Visual Computer.

[19]  Ramani Duraiswami,et al.  Fast multipole methods on graphics processors , 2008, J. Comput. Phys..

[20]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[21]  A. Arnold,et al.  Harvesting graphics power for MD simulations , 2007, 0709.3225.

[22]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[23]  Shinnosuke Obi,et al.  Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence , 2009, Comput. Phys. Commun..

[24]  Rainer Künnemeyer,et al.  Accelerating Monte Carlo simulations with an NVIDIA® graphics processor , 2009, Comput. Phys. Commun..

[25]  Jeffrey S. Vetter,et al.  Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study , 2009, Parallel Comput..

[26]  Tsuyoshi Hamada,et al.  The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units , 2007 .