A multi-level parallelization concept for high-fidelity multi-block solvers

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) solvers employed in ENSAERO and OVERFLOW advanced flow analysis packages. These advanced engineering and scientific analysis packages include more than 300,000 lines of code written in FORTRAN 77 language in more than 1300 individual subprograms. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity with data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms for these legacy packages. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing possibilities only during the execution stage, the PENS solver is used on different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. Improvements in computational load-balance and speeds are extremely crucial on the realistic problems in the design of aerospace vehicles. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft geometries achieve 85 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Onyx2, and Cray T3E are the other platforms where the robustness, performance behavior, and the parallel scalability of the implementation are tested and fine-tuned for actual production run environments.