The Next Four Orders of Magnitude in Performance for Parallel CFD

Many of the “Grand Challenges” of HPCC, ASCI, and SSI are formulated as PDEs, however, PDE simulations have struggled to hold their own among recent Bell Prize submissions, as they require a balance among architectural components that is not necessarily met in a machine designed to “max out” on the standard LINPACK benchmark. Until recently, Computational Fluid Dynamics (CFD) has successfully competed against applications with more intensive data reuse only on special-purpose machines (vector or SIMD) in statically discretized, explicit formulations. PDEs come in many varieties and complexities, but though their mathematical properties differ greatly, their computational implementations are surprisingly similar, whether of evolution or equilibrium type. This chapter briefly reviews the algorithmic structure of typical PDE-based CFD codes that is responsible for this situation and consider possible architectural and algorithmic sources for performance improvement towards the achievement of the remaining four orders of magnitude required to reach 1 Petaflop/s.