The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for Exascale Computing

With the upcoming transition from petascale to exascale computers radically new methods for scalable and robust computing are required. Computing at the speed of exascale, that is, more than 1018 floating point operations per second, will only be possible on systems with millions of processing units. Unfortunately, the large number of functional components like computing cores, memory chips and network interfaces will greatly increase the probability of failures, and it can thus not be expected that an exascale application will complete its execution on exactly the same resources it was started. In this paper, we investigate the impact of unfavorable process placement and oversubscription of compute resources on the performance and scalability of typical application workloads like CP2K, MOM5 and BQCD. We provide results on two HPC architectures, a Cray XC40 with proprietary Aries network routers and dragonfly topology, and an InfiniBand cluster.