High-performance parallel implicit CFD

Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.

[1]  William Gropp,et al.  Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD , 2000, Int. J. High Perform. Comput. Appl..

[2]  Zeki Demirbilek,et al.  Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP , 2000, Int. J. High Perform. Comput. Appl..

[3]  William Gropp,et al.  Performance Modeling and Tuning of an Unstructured Mesh CFD Application , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[4]  W. K. Anderson,et al.  Implicit/Multigrid Algorithms for Incompressible Turbulent Flows on Unstructured Grids , 1995 .

[5]  Sivan Toledo,et al.  Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[6]  C. Kelley,et al.  Convergence Analysis of Pseudo-Transient Continuation , 1998 .

[7]  Anthony Skjellum,et al.  Using MPI: portable parallel programming with the message-passing interface, 2nd Edition , 1999, Scientific and engineering computation series.

[8]  D. Mavriplis Parallel unstructured mesh analysis of high-lift configurations , 2000 .

[9]  Tamara G. Kolda,et al.  Asynchronous Parallel Pattern Search for Nonlinear Optimization , 2001, SIAM J. Sci. Comput..

[10]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[11]  William Gropp,et al.  Domain Decomposition: Parallel Multilevel Algorithms for Elliptic Partial Di erential Equations , 1995 .

[12]  G KoldaTamara,et al.  Asynchronous Parallel Pattern Search for Nonlinear Optimization , 2001 .

[13]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[14]  D. Keyes,et al.  Toward Realistic Performance Bounds for Implicit CFD , 1999 .

[15]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[16]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[17]  B. V. Leer,et al.  Experiments with implicit upwind methods for the Euler equations , 1985 .

[18]  Danesh K. Tafti,et al.  Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems , 1999, Int. J. High Perform. Comput. Appl..

[19]  David E. Keyes,et al.  Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations , 2000, Euro-Par.

[20]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[21]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[22]  W. K. Anderson,et al.  An implicit upwind algorithm for computing turbulent flows on unstructured grids , 1994 .

[23]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[24]  Jack Dongarra,et al.  Top500 Supercomputer Sites , 1997 .

[25]  P. Sadayappan,et al.  On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[26]  D. Keyes How Scalable is Domain Decomposition in Practice , 1998 .

[27]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[28]  Olivier Temam,et al.  Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.

[29]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .