HPCTOOLKIT: tools for performance analysis of optimized parallel programs

HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space–time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright © 2009 John Wiley & Sons, Ltd.

[1]  Jeffrey S. Vetter,et al.  A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications , 2001, WOMPAT.

[2]  Thomas E. Anderson,et al.  Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.

[3]  Nathan R. Tallent,et al.  Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.

[4]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[5]  Allen D. Malony,et al.  Observing Performance Dynamics Using Parallel Profile Snapshots , 2008, Euro-Par.

[6]  Nathan R. Tallent,et al.  Binary analysis for measurement and attribution of program performance , 2009, PLDI '09.

[7]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[8]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[9]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[10]  Allen D. Malony,et al.  Optimization of Instrumentation in Parallel Performance Evaluation Tools , 2006, PARA.

[11]  Bernd Mohr,et al.  Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.

[12]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[13]  Jeffrey S. Vetter Performance analysis of distributed applications using automatic classification of communication inefficiencies , 2000, ICS '00.

[14]  Peter Hasenfratz,et al.  LATTICE QUANTUM CHROMODYNAMICS , 1983 .

[15]  Alan D. George,et al.  GASP! A Standardized Performance Analysis Tool Interface for Global Address Space Programming Models , 2006, PARA.

[16]  Nathan Froyd,et al.  Scalability analysis of SPMD codes using expectations , 2007, ICS '07.

[17]  Marco Zagha,et al.  OriginTM 2000 and Onyx2® Performance Tuning and Optimization Guide , 1993 .

[18]  Bernd Mohr,et al.  A Performance Monitoring Interface for OpenMP , 2002 .

[19]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[21]  William Gropp,et al.  From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[22]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[23]  Nathan R. Tallent,et al.  Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.

[24]  Nathan R. Tallent,et al.  Diagnosing performance bottlenecks in emerging petascale applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[25]  Nathan Froyd,et al.  Low-overhead call path profiling of unmodified, optimized code , 2005, ICS '05.

[26]  Nathan Russell Tallent Binary analysis for attribution and interpretation of performance measurements on fully-optimized code , 2007 .

[27]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[28]  Jeffrey S. Vetter,et al.  Dynamic statistical profiling of communication activity in distributed applications , 2002, SIGMETRICS '02.

[29]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[30]  Karsten Schwan,et al.  Falcon: On-line monitoring for steering parallel programs , 1998, Concurr. Pract. Exp..

[31]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[32]  Nathan Froyd,et al.  Call path profiling for unmodified, optimized binaries , 2006 .

[33]  Michael Gerndt,et al.  : A Profiling Tool for OpenMP , 2005, IWOMP.

[34]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[35]  Bernd Mohr,et al.  Efficient Pattern Search in Large Traces Through Successive Refinement , 2004, Euro-Par.

[36]  Jeffrey K. Hollingsworth,et al.  The dynamic probe class library-an infrastructure for developing instrumentation for performance tools , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[37]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[38]  Jack J. Dongarra,et al.  On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications , 2007, Euro-Par.

[39]  Martin Schulz,et al.  PNMPI tools: a whole lot greater than the sum of their parts , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[40]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[41]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[42]  T. Hahm,et al.  Turbulent transport reduction by zonal flows: massively parallel simulations , 1998, Science.