Efficient Parallelization of an Unstructured Grid Solver : A Memory-Centric Approach

For an unstructured grid computational fluid dynamics computation typical of many large-scale partial differential equations requiring implicit treatment, we describe coding practices that lead to high implementation efficiency for standard computational and communication kernels, in both uniprocessor and parallel senses. Moreover, a family of Newton-like preconditioned Krylov algorithms whose convergence rate degrades only slightly with increasing parallel granularity, relying primarily on sparse Jacobian-vector multiplications, can be expressed in terms of these kernels. A combination of the three (uniprocessor performance, parallel scalability, and algorithmic scalability) is required for overall high performance on the largest scale problems that a given generation of parallel platforms supports.