A distributed-memory implementation of kiva-3 with refinements for large eddy simulation

In spite of the tremendous advancement of computer technology within the last decade, today's grand challenge problems are still beyond the numerical simulation capability of the most state-of-the-art hardware. Fluid dynamics problems involving the prediction capability for turbulent flows represent one example for such situations. Large eddy simulation (LES) is a viable alternative to direct numerical simulation (DNS) of turbulent flows due to the fact that certain portion of the detailed small-scale information resolved in DNS is suppressed in LES technique in order to make the computational problem more tractable. However, even with LES, today's challenging problems require accurate schemes with higher grid resolution. In addition to these, a faster turn around time to approximate the instantaneous flow behavior has become important. The objective of this study is to refine existing models and develop new approaches to acquire the computational fluid dynamics tools towards performing meaningful large-eddy simulation of weakly compressible turbulent flows. For this purpose, KIVA-3 code, which was originally developed by Los Alamos National Laboratory, has been utilized as the development environment and the computational fluid dynamics tool. The accomplishments of this study can be summarized in three distinct categories; (1) Accuracy improvements: The time accuracy of the code was made fully second-order by implementing a combination of two-stage Runge-Kutta and Adams-Bashforth schemes into the convection phase. Spatial accuracy was also improved substantially by introducing a third convection scheme option where central differencing and quasi-second order upwinding (QSOU) schemes are used in combination. (2) Efficiency improvements: The efficiency of the code on single processor was unproved up to 20% by replacing the “diagonal preconditioner” with a more sophisticated Symmetric Gauss-Seidel and Successive Over-relaxation (SGS/SSOR) type preconditioner without an increase in memory allocation. (3) Computational performance improvements: A distributed-memory implementation of KIVA-3 code based on one-dimensional domain decomposition was successfully developed and tested. All of the essential features of KIVA-3 excluding chemical reactions, spray dynamics and piston movement have been parallelized. The current implementation based on MPI message-passing library is available on several hardware platforms including SGI Origin 2000, Cray T3E and recently Alpha Linux based Beowulf cluster. Benchmark runs for speedup and parallel efficiency were performed on up to 48 processors with several grid resolutions starting from 250,000 to 4.28 million cells for the selected validation and benchmark problems on Origin 2000. The results indicate that for the selected problems, 70–80% of linear speedup can be achieved on up to 48 processors, even with grids larger than one million vertices. Also a parallel efficiency of 60–70% is maintained.