Bio-Inspired Call-Stack Reconstruction for Performance Analysis

The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.

[1]  Jesús Labarta,et al.  Unveiling Internal Evolution of Parallel Application Computation Phases , 2011, 2011 International Conference on Parallel Processing.

[2]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5]  Amer Diwan,et al.  Inferred call path profiling , 2009, OOPSLA 2009.

[6]  Arnaldo Carvalho de Melo,et al.  The New Linux ’ perf ’ Tools , 2010 .

[7]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[8]  Dirk Schmidl,et al.  Score-P: A Unified Performance Measurement System for Petascale Applications , 2010, CHPC.

[9]  G. Madec NEMO ocean engine , 2008 .

[10]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[11]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[12]  Nathan Froyd,et al.  Low-overhead call path profiling of unmodified, optimized code , 2005, ICS '05.

[13]  Nathan R. Tallent,et al.  HPCToolkit: performance tools for scientific computing , 2008 .

[14]  John Whaley,et al.  A portable sampling-based profiler for Java virtual machines , 2000, JAVA '00.

[15]  J. Wiley PRACTICAL EXPERIENCE OF THE LIMITATIONS OF GPROF , 1993 .

[16]  Stephen A. Jarvis,et al.  Exploiting spatiotemporal locality for fast call stack traversal , 2012 .

[17]  Juan Gonzalez,et al.  Automatic detection of parallel applications computation phases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[18]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[19]  Bernd Mohr,et al.  Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications , 2008, Parallel Tools Workshop.

[20]  Martin Schulz,et al.  Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[21]  Heinz Pitsch,et al.  High order conservative finite difference scheme for variable density low Mach number turbulent flows , 2007, J. Comput. Phys..