Towards massively parallel simulations of massively parallel high-performance computing systems

The power of high-performance computing (HPC) is applied to simulate highly complex systems and processes in many scientific communities, e. g. in particle physics, weather and climate research, bio-sciences, materials science, pharmaceutics, astronomy, or finance. Current HPC systems are so complex that the design of such a system, including architecture design space exploration and performance prediction, requires HPC-like simulation capabilities. To this end, we developed an Omnest-based simulation environment that enables studying the impact of an HPC machine's communication subsystem on the overall system's performance for specific workloads. As the scale of current high-end HPC systems is in the range of hundreds of thousands of processing cores, full system simulation---at an abstraction level that still maintains a reasonably high level of detail---is infeasible without resorting to parallel simulation, the main limiting factors being simulation run time and memory footprint. We describe our experiences in adapting our simulation environment to take advantage of the parallel distributed simulation capabilities provided by Omnest. We present results obtained on a many-core SMP machine as well as a small-scale InfiniBand cluster. Furthermore, we ported our simulation environment, including Omnest itself, to the massively parallel IBM Blue Gene®/P platform. We report results from initial experiments on this platform using up to 512 cores in parallel.