PAMDA: Performance Assessment Using MAQAO Toolset and Differential Analysis

Identifying performance bottlenecks in applications is crucial to improve their efficiency, but it might be difficult to precisely assess their impact on performance: in particular, two performance problems can interact making it difficult to isolate and therefore to correct them. We propose PAMDA, a methodology to single out performance problems through hierarchical bottlenecks detection. Important potential performance issues are classified in a ‘Performance Breakdown Tree’ which is used to drive our iterative analysis cycle, prioritizing the most relevant problems. Our system relies on MAQAO toolset and code’s differential analysis. While MAQAO is a performance analysis and optimization tool suite, the differential analysis approach, which is implemented through DECAN tool, consists in quantifying performance changes when applying controlled transformations to the target code. Our focus will be on performance issues raised by processors and memory sub-systems in multicore architectures. We will demonstrate the approach on loops extracted from real life HPC applications.

[1]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[2]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[3]  Andres Charif Rubial,et al.  MIL: A language to build program analysis tools through static binary instrumentation , 2013, 20th Annual International Conference on High Performance Computing.

[4]  Lars Koesterke,et al.  PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  William Jalby,et al.  A Balanced Approach to Application Performance Tuning , 2009, LCPC.

[6]  Andres Charif Rubial,et al.  Performance Tuning of x86 OpenMP Codes with MAQAO , 2009, Parallel Tools Workshop.

[7]  William Jalby,et al.  Quantifying performance bottleneck cost through differential analysis , 2013, ICS '13.

[8]  Sangkyum Kim,et al.  ADP: automated diagnosis of performance pathologies using hardware events , 2012, SIGMETRICS '12.

[9]  B. Schimmelpfennig,et al.  Quantum chemical and molecular dynamics study of the coordination of Th(IV) in aqueous solvent. , 2010, The journal of physical chemistry. B.

[10]  Sadaf R. Alam,et al.  Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.

[11]  William Jalby,et al.  MicroTools: Automating Program Generation and Performance Measurement , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[12]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010 .

[13]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[14]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[15]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[16]  E. Baysal,et al.  Reverse time migration , 1983 .

[17]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.