HPCToolkit : Performance Measurement and Analysis for Supercomputers with Node-level Parallelism

Today’s largest supercomputers consist of tens of thousands of nodes equipped with one or more multi-core microprocessors. A challenge for performance tools is that bottlenecks in programs executing on these systems may arise from a myriad of causes. To address this problem, Rice University is developing HPCTOOLKIT an integrated suite of tools that supports sampling-based measurement, analysis, attribution, and presentation of application performance for fully-optimized parallel programs. This paper provides a brief overview of performance analysis challenges on supercomputers with node-level parallelism, describes how HPCToolkit supports a variety of performance analysis strategies that can pinpoint and quantify impediments to scalable high performance in parallel applications both within and across nodes, and outlines some remaining