An 18 GFLOPS parallel climate data assimilation PSAS package

Abstract We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data simulation package, as demonstrated by detailed performance analysis of systematic runs up to 512 nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.