Single Node Performance Analysis of Applications on HPCx

This report analyses the performance of a range of application codes on an IBM p575, which forms a single node of the Phase 3 HPCx system. We find that most codes run at between 8% and 20% of the nominal peak floating point performance of the system. A small number of codes, which heavily utilize tuned libraries, run at between 20% and 50% of peak. For each code we also collected and analysed a range of other performance metrics derived from hardware counters on the Power5 processor. We also investigate the performance impact of enabling simultaneous multithreading SMT: the performance gain varied from a 29% slowdown to a 44% speedup. We gain some interesting insights into the performance of the set of codes, but also expose some of the shortcomings of this approach.