Accelerating Computational Fluid Dynamics on the IBM Blue Gene/P Supercomputer

Computational Fluid Dynamics (CFD) is an increasingly important application domain for computational scientists. In this paper, we propose and analyze optimizations necessary to run CFD simulations consisting of multi-billion-cell mesh models on large processor systems. Our investigation leverages the general industrial Navier-Stokes open-source CFD application, Code_Saturne, developed by Electricité de France (EDF). Our work considers emerging processor features such as many-core, Symmetric Multi-threading (SMT), Single Instruction Multiple Data (SIMD), Transactional Memory, and Thread Level Speculation. Initially, we have targeted per-node performance improvements by reconstructing the code and data layouts to optimally use multiple threads. We present a general loop transformation that will enable the compiler to generate OpenMP threads effectively with minimal impact to overall code structure. A renumbering scheme for mesh faces is proposed to enhance thread-level parallelism and generally improve data locality. Performance results on IBM Blue Gene/P supercomputer and Intel Xeon Westmere cluster are included.

[1]  Paul F. Fischer,et al.  Fast Parallel Direct Solvers for Coarse Grid Problems , 2001, J. Parallel Distributed Comput..

[2]  Eun Im,et al.  Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .

[3]  Leonid Oliker,et al.  Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms , 2000 .

[4]  Gerd Heber,et al.  Self-Avoiding Walks over Adaptive Unstructured Grids , 1999, Concurr. Pract. Exp..

[5]  Gerd Heber,et al.  Self‐avoiding walks over adaptive unstructured grids , 2000 .

[6]  O. Axelsson,et al.  A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning , 1991 .

[7]  Paul F. Fischer,et al.  Hybrid Multigrid/Schwarz Algorithms for the Spectral Element Method , 2005, J. Sci. Comput..

[8]  F. Archambeau,et al.  Code Saturne: A Finite Volume Code for the computation of turbulent incompressible flows - Industrial Applications , 2004 .

[9]  Rainald Löhner,et al.  Deflated preconditioned conjugate gradient solvers for the Pressure-Poisson equation , 2008, J. Comput. Phys..

[10]  P. Fischer,et al.  Petascale algorithms for reactor hydrodynamics , 2008 .

[11]  Cornelis Vuik,et al.  Fast and robust solvers for pressure-correction in bubbly flow problems , 2008, J. Comput. Phys..

[12]  E. F. D'Azevedoy,et al.  Lapack Working Note 56 Conjugate Gradient Algorithms with Reduced Synchronization Overhead on Distributed Memory Multiprocessors , 1999 .

[13]  Cornelis Vuik,et al.  Comparison of Two-Level Preconditioners Derived from Deflation, Domain Decomposition and Multigrid Methods , 2009, J. Sci. Comput..

[14]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.