PAPI deployment, evaluation, and extensions

PAPI is a cross-platform interface to the hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count events, which are occurrences of specific signals related to processor functions. Monitoring these events has a variety of uses in application development, including performance modeling and optimization, debugging, and benchmarking. In addition to routines for accessing the counters, PAPI specifies a common set of performance metrics considered most relevant to analyzing and tuning application performance. These metrics include cycle and instruction counts, cache and memory access statistics, and functional unit and pipeline status, as well as relevant SMP cache coherence events. PAPI is becoming a de facto industry standard and has been incorporated into several third-party research and commercial performance analysis tools. As in any physical system, the act of measuring perturbs the phenomenon being measured. Discrepancies in hardware counts and counter-related profiling data can result from other causes as well. A PET-sponsored project is deploying PAPI and related tools on DoD HPC Center platforms and evaluating and interpreting performance counter data on those platforms.