An efficient highly parallel implementation of a large air pollution model on an IBM blue gene supercomputer

In this paper we discuss the efficient distributed-memory parallelization strategy of the Unified Danish Eulerian Model (UNI-DEM). We apply an improved decomposition strategy to the spatial domain in order to get more parallel tasks (based on the larger number of subdomains) with less communications between them (due to optimization of the overlapping area when the advection-diffusion problem is solved numerically). This kind of rectangular block partitioning (with a squareshape trend) allows us not only to increase significantly the number of potential parallel tasks, but also to reduce the local memory requirements per task, which is critical for the distributed-memory implementation of the higher-resolution/finergrid versions of UNI-DEM on some parallel systems, and particularly on the IBM BlueGene/P platform - our target hardware. We will show by experiments that our new parallel implementation can use rather efficiently the resources of the powerful IBM BlueGene/P supercomputer, the largest in Bulgar...