Achieving High Sustained Performance in an Unstructured Mesh CFD Application

This paper highlights a three-year project by an interdisciplinary team on a legacy F77 computational fluid dynamics code, with the aim of demonstrating that implicit unstructured grid simulations can execute at rates not far from those of explicit structured grid codes, provided attention is paid to data motion complexity and the reuse of data positioned at the levels of the memory hierarchy closest to the processor, in addition to traditional operation count complexity. The demonstration code is from NASA and the enabling parallel hardware and (freely available) software toolkit are from DOE, but the resulting methodology should be broadly applicable, and the hardware limitations exposed should allow programmers and vendors of parallel platforms to focus with greater encouragement on sparse codes with indirect addressing. This snapshot of ongoing work shows a performance of 15 microseconds per degree of freedom to steady-state convergence of Euler flow on a mesh with 2.8 million vertices using 3072 dual-processor nodes of Sandia''s ``ASCI Red'''' Intel machine, corresponding to a sustained floating-point rate of 0.227 Tflop/s.

[1]  Mitchell Luskin,et al.  Parallel Solution of Partial Differential Equations , 2000 .

[2]  Barry F. Smith,et al.  Newton-Krylov-Schwarz methods for aerodynamics problems : compressible and incompressible flows on unstructured grids. , 1999 .

[3]  Danesh K. Tafti,et al.  Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems , 1999, Int. J. High Perform. Comput. Appl..

[4]  F. Yuan,et al.  SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) , 1999 .

[5]  D. Keyes,et al.  Toward Realistic Performance Bounds for Implicit CFD , 1999 .

[6]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[7]  C. Kelley,et al.  Convergence Analysis of Pseudo-Transient Continuation , 1998 .

[8]  D. Keyes How Scalable is Domain Decomposition in Practice , 1998 .

[9]  William Gropp,et al.  Parallel Implicit PDE Computations , 1997, Parallel CFD.

[10]  David E. Keyes,et al.  On the Interaction of Architecture and Algorithm in the Domain-based Parallelization of an Unstructu , 1997 .

[11]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[12]  William Gropp,et al.  Extensible Toolkit for Scientific Computing , 1996 .

[13]  W. K. Anderson,et al.  Implicit/Multigrid Algorithms for Incompressible Turbulent Flows on Unstructured Grids , 1995 .

[14]  W. K. Anderson,et al.  An implicit upwind algorithm for computing turbulent flows on unstructured grids , 1994 .

[15]  P. Spalart A One-Equation Turbulence Model for Aerodynamic Flows , 1992 .

[16]  R. Tiwari,et al.  Who is (are) the author(s) , 1977 .

[17]  J. Miller Numerical Analysis , 1966, Nature.