Space-efficient time-series call-path profiling of parallel applications

The performance behavior of parallel simulations often changes considerably as the simulation progresses - with potentially process-dependent variations of temporal patterns. While call-path profiling is an established method of linking a performance problem to the context in which it occurs, call paths reveal only little information about the temporal evolution of performance phenomena. However, generating call-path profiles separately for thousands of iterations may exceed available buffer space - especially when the call tree is large and more than one metric is collected. In this paper, we present a runtime approach for the semantic compression of call-path profiles based on incremental clustering of a series of single-iteration profiles that scales in terms of the number of iterations without sacrificing important performance details. Our approach offers low runtime overhead by using only a condensed version of the profile data when calculating distances and accounts for process-dependent variations by making all clustering decisions locally.

[1]  Jack J. Dongarra,et al.  On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications , 2007, Euro-Par.

[2]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[3]  Felix Wolf,et al.  CATCH - A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications , 2002, Euro-Par.

[4]  Darren J. Kerbyson,et al.  Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems , 2007, PARCO.

[5]  Robert J. Fowler,et al.  Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Jeffrey S. Vetter,et al.  Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Wolfgang E. Nagel,et al.  Construction and compression of complete call graphs for post-mortem program trace analysis , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[8]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[9]  J. Vetter,et al.  Managing Performance Analysis with Dynamic Statistical Projection Pursuit , 2000, ACM/IEEE SC 1999 Conference (SC'99).

[10]  Felix Wolf,et al.  SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications , 2008, SIPEW.

[11]  Barton P. Miller,et al.  Incremental call‐path profiling , 2007, Concurr. Comput. Pract. Exp..

[12]  Jesús Labarta,et al.  Automatic Phase Detection of MPI Applications , 2007, PARCO.

[13]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[16]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[17]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[18]  Allen D. Malony,et al.  Performance Evaluation of Adaptive Scientific Applications using TAU , 2005 .

[19]  Charng-Da Lu,et al.  Compact Application Signatures for Parallel and Distributed Scientific Codes , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[20]  Philip C. Roth,et al.  Real-Time Statistical Clustering for Event Trace Reduction , 1997, Int. J. High Perform. Comput. Appl..

[21]  Allen D. Malony,et al.  Phase-Based Parallel Performance Profiling , 2005, PARCO.

[22]  Markus Geimer,et al.  Scalable Collation and Presentation of Call-Path Profile Data with CUBE , 2007, PARCO.

[23]  Felix Wolf,et al.  Scalasca Parallel Performance Analyses of PEPC , 2008, Euro-Par Workshops.