IMPLEMENTATION OF A CARTESIAN GRID INCOMPRESSIBLE NAVIER-STOKES SOLVER ON MULTI-GPU DESKTOP PLATFORMS USING CUDA

Today’s Graphics Processor Units (GPU) are powerful computation platforms used not only for graphic rendering but also for multi-purpose computation. Now reaching a teraflops of peak performance and over a 100 GB/sec of bandwidth, GPUs outperform the latest CPUs and provide a new high-performance computing platform. New languages such as CUDA and Brook+ allow developers to target the programmable unit of the GPUs without a graphics programming background. Scientists and engineers in various fields have started benefiting from the last generations of GPUs. In this thesis, the implementation of a Navier-Stokes solver for incompressible flow around urban-like domains is presented. Transport and dispersion of contaminants in urban environments is an area of intense research. The computational fluid dynamic (CFD) models necessary to provide realistic simulations require heavy computation, usually only possible on CPU clusters. This thesis presents the base for an urban dispersion model implementation on desktop platforms, using one or multiple GPUs as coprocessors. The governing equations implemented for this thesis are common to many problems in CFD where flow motion is involved. Using a single Tesla C870 GPU card, the CUDA implementation of the lid-driven cavity problem runs 33 times faster than a serial C code running on a single core of an AMD Opteron 2.4GHz processor. A speedup of 100 was reached by associating the Tesla S870 quad-GPU system to a quad-core CPU machine. Computations for both GPU and CPU are single precision. A more complex application including obstacle capability was developed to model building effects in the domain. Using vi the quad-GPU system, the flow-field in a domain of 1.28 km × 1.28 km × 320 m was computed. A low Reynolds number flow-field projection of 22 minutes (1000 time steps) could be simulated in 3 minutes. Results show that an urban dispersion is feasible on this type of platform and that models can be run within minutes to provide emergency responses. More generally, it shows that complex CFD problems can benefit from multi-GPU desktop architectures.

[1]  Arie E. Kaufman,et al.  GPU-Based flow simulation with complex boundaries , 2010 .

[2]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[3]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[4]  Jay P. Boris,et al.  Large Scale Urban Contaminant Transport Simulations With Miles , 2007 .

[5]  Manfred Krafczyk,et al.  TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[6]  Thomas Ertl,et al.  CUDASA: Compute Unified Device and Systems Architecture , 2008, EGPGV@Eurographics.

[7]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[10]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[12]  SkadronKevin,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008 .

[13]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[14]  Avi Bleiweiss,et al.  GPU accelerated pathfinding , 2008, GH '08.

[15]  Robert A. van de Geijn,et al.  Making Programming Synonymous with Programming for Linear Algebra Libraries FLAME Working Note # 31 , 2008 .

[16]  T. Stathopoulos,et al.  CFD simulation of the atmospheric boundary layer: wall function problems , 2007 .

[17]  Rafael Mayo,et al.  Solving Dense Linear Systems on Graphics Processors , 2008, Euro-Par.

[18]  U. Ghia,et al.  High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method , 1982 .

[19]  Jonathan M. Cohen,et al.  Low viscosity flow simulations for animation , 2008, SCA '08.

[20]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[21]  M.I.T. Press,et al.  The International Journal of Supercomputer Applications and High Performance Computing— , 1994 .

[22]  A. Chorin Numerical solution of the Navier-Stokes equations , 1968 .

[23]  Inanc Senocak,et al.  Rapid-Response Urban CFD Simulations Using a GPU Computing Paradigm on Desktop Supercomputers , 2009 .

[24]  Julia E. Flaherty,et al.  Computational Fluid Dynamic Simulations of Plume Dispersion in Urban Oklahoma City , 2007 .

[25]  Wen-mei W. Hwu,et al.  MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.

[26]  Sam S. Stone,et al.  MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores , 2011 .

[27]  Mike Houston Stream computing , 2008, SIGGRAPH '08.

[28]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[29]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[30]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[31]  R. Pletcher,et al.  Computational Fluid Mechanics and Heat Transfer. By D. A ANDERSON, J. C. TANNEHILL and R. H. PLETCHER. Hemisphere, 1984. 599 pp. $39.95. , 1986, Journal of Fluid Mechanics.

[32]  Inanc Senocak,et al.  CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .

[33]  Forum Mpi MPI: A Message-Passing Interface , 1994 .

[34]  Evaluation of a Fast-Running Urban Dispersion Modeling System Using Joint Urban 2003 Field Data , 2007 .

[35]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[36]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[37]  Julia E. Flaherty,et al.  Evaluation study of building-resolved urban dispersion models , 2007 .

[38]  Enhua Wu,et al.  Real-time 3D fluid simulation on GPU with complex obstacles , 2004, 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings..

[39]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[40]  Weiguo Liu,et al.  Molecular Dynamics Simulations on Commodity GPUs with CUDA , 2007, HiPC.

[41]  Julia E. Flaherty,et al.  Urban Dispersion Program Overview and MID05 Field Study Summary , 2007 .

[42]  M.I.T. Press,et al.  The International Journal of Supercomputer Applications and High Performance Computing— , 1997 .

[43]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[44]  Eric A. Hendricks,et al.  Urban Dispersion Modeling: Comparison with Single-Building Measurements , 2007 .