HPCVIEW: A Tool for Top-down Analysis of Node Performance

It is increasingly difficult for complex scientific programs to attain a significant fraction of peak performance on systems that are based on microprocessors with substantial instruction-level parallelism and deep memory hierarchies. Despite this trend, performance analysis and tuning tools are still not used regularly by algorithm and application designers. To a large extent, existing performance tools fail to meet many user needs and are cumbersome to use. To address these issues, we developed HPCVIEW—a toolkit for combining multiple sets of program profile data, correlating the data with source code, and generating a database that can be analyzed anywhere with a commodity Web browser. We argue that HPCVIEW addresses many of the issues that have limited the usability and the utility of most existing tools. We originally built HPCVIEW to facilitate our own work on data layout and optimizing compilers. Now, in addition to daily use within our group, HPCVIEW is being used by several code development teams in DoD and DoE laboratories as well as at NCSA.

[1]  Luiz De Rose The Hardware Performance Monitor Toolkit , 2001, Euro-Par.

[2]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[3]  D.A. Reed,et al.  An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[4]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[5]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[6]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Yong Luo,et al.  Instruction-Level Microprocessor Modeling of Scientific Applications , 1999, ISHPC.

[8]  Donald E. Knuth,et al.  Optimal measurement points for program frequency counts , 1973 .

[9]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[10]  Robert E. Tarjan Testing flow graph reducibility , 1973, STOC '73.

[11]  Wagner Meira,et al.  Waiting time analysis and performance visualization in Carnival , 1996, SPDT '96.

[12]  David B. Whalley,et al.  Tools for application-oriented performance tuning , 2001, ICS '01.

[13]  Paul Havlak,et al.  Nesting of reducible and irreducible loops , 1997, TOPL.

[14]  John L. Hennessy,et al.  MTOOL: A Method for Isolating Memory Bottlenecks in Shared Memory Multiprocessor Programs , 1991, ICPP.

[15]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[16]  Margaret Martonosi,et al.  Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.

[17]  David B. Whalley,et al.  On providing useful information for analyzing and tuning applications , 2001, SIGMETRICS '01.

[18]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[19]  Thomas J. LeBlanc,et al.  Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[20]  Ken Kennedy,et al.  Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..