Many of the “Grand Challenges” of HPCC, ASCI, and SSI are formulated as PDEs, however, PDE simulations have struggled to hold their own among recent Bell Prize submissions, as they require a balance among architectural components that is not necessarily met in a machine designed to “max out” on the standard LINPACK benchmark. Until recently, Computational Fluid Dynamics (CFD) has successfully competed against applications with more intensive data reuse only on special-purpose machines (vector or SIMD) in statically discretized, explicit formulations. PDEs come in many varieties and complexities, but though their mathematical properties differ greatly, their computational implementations are surprisingly similar, whether of evolution or equilibrium type. This chapter briefly reviews the algorithmic structure of typical PDE-based CFD codes that is responsible for this situation and consider possible architectural and algorithmic sources for performance improvement towards the achievement of the remaining four orders of magnitude required to reach 1 Petaflop/s.
[1]
William Gropp,et al.
Parallel Newton-Krylov-Schwarz Algorithms for the Transonic Full Potential Equation
,
1996,
SIAM J. Sci. Comput..
[2]
David E. Keyes,et al.
Towards Polyalgorithmic Linear System Solvers for Nonlinear Elliptic Problems
,
1994,
SIAM J. Sci. Comput..
[3]
D. Keyes.
How Scalable is Domain Decomposition in Practice
,
1998
.
[4]
D. P. Young,et al.
A locally refined rectangular grid finite element method: application to computational fluid dynamics and computational physics
,
1990
.
[5]
W. K. Anderson,et al.
Achieving High Sustained Performance in an Unstructured Mesh CFD Application
,
1999,
ACM/IEEE SC 1999 Conference (SC'99).