Scalability and Parallel Efficiency of Block-Pipelined Algorithm for Direct Simulation of Turbulent Compressible Mixing Layer Flows

Data dependency is one of main difficulties in parallel implementation of numerical algorithms on distributed memory parallel computers. A typical example is the solution of tridiagonal linear system of equations, which is frequently encountered in numerical solution of PDEs. Although significant progress has been made in designing parallel algorithms, which are suitable for distributed memory systems, this difficulty remains a problem for many algorithm designers, especially for those working in the field of CFD [1]. We have presented in [2] an algorithm for efficiently solving a set of tridiagonal linear systems of equations on distributed memory parallel systems, the ?block pipelined algorithm?. It permits to obtain good parallel speedup while maintaining the same computational complexity as optimal sequential algorithms. The basic idea is, when dealing with parallelization based on domain decomposition, if data dependency occurs in one space direction, then parallelism can often by exploited in other space directions. In this paper, we will give some more detailed analysis on the parallel performance and scalability properties of the block pipelined algorithm. Numerical results concerning its parallel efficiency in direct simulation of turbulent compressible mixing layer flows will also be presented.