DARC: dynamic analysis of root causes of latency distributions

OSprof is a versatile, portable, and efficient profiling methodology based on the analysis of latency distributions. Although OSprof has offers several unique benefits and has been used to uncover several interesting performance problems, the latency distributions that it provides must be analyzed manually. These latency distributions are presented as histograms and contain distinct groups of data, called peaks, that characterize the overall behavior of the running code. By automating the analysis process, we make it easier to take advantage of OSprof's unique features. We have developed the Dynamic Analysis of Root Causes system (DARC), which finds root cause paths in a running program's call-graph using runtime latency analysis. A root cause path is a call-path that starts at a given function and includes the largest latency contributors to a given peak. These paths are the main causes for the high-level behavior that is represented as a peak in an OSprof histogram. DARC performs PID and call-path filtering to reduce overheads and perturbations, and can handle recursive and indirect calls. DARC can analyze preemptive behavior and asynchronous call-paths, and can also resume its analysis from a previous state, which is useful when analyzing short-running programs or specific phases of a program's execution. We present DARC and show its usefulness by analyzing behaviors that were observed in several interesting scenarios. We also show that DARC has negligible elapsed time overheads for normal use cases.

[1]  J. Larus Whole program paths , 1999, PLDI '99.

[2]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[3]  Barton P. Miller,et al.  Fine-grained dynamic instrumentation of commodity operating system kernels , 1999, OSDI '99.

[4]  Felix Wolf,et al.  CATCH - A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications , 2002, Euro-Par.

[5]  Nikolai Joukov,et al.  Operating system profiling via latency analysis , 2006, OSDI '06.

[6]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[7]  Thomas W. Reps,et al.  Interprocedural Path Profiling , 1999, CC.

[8]  Allen D. Malony,et al.  Kernel-Level Measurement for Integrated Parallel Performance Views: the KTAU Project , 2006, 2006 IEEE International Conference on Cluster Computing.

[9]  Barton P. Miller,et al.  A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper) , 2000, Euro-Par.

[10]  Barton P. Miller,et al.  A callgraph‐based search strategy for automated performance diagnosis , 2002, Concurr. Comput. Pract. Exp..

[11]  Nathan Froyd,et al.  Low-overhead call path profiling of unmodified, optimized code , 2005, ICS '05.

[12]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  Nikolai Joukov,et al.  Auto-pilot: A Platform for System Software Benchmarking , 2005, USENIX Annual Technical Conference, FREENIX Track.

[14]  Barton P. Miller,et al.  CrossWalk: A Tool for Performance Profiling Across the User-Kernel Boundary , 2003, PARCO.

[15]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[17]  Barton P. Miller,et al.  Incremental call‐path profiling , 2007, Concurr. Comput. Pract. Exp..

[18]  Barton P. Miller,et al.  Using Dynamic Kernel Instrumentation for Kernel and Application Tuning , 1999, Int. J. High Perform. Comput. Appl..