Low-overhead call path profiling of unmodified, optimized code

Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven strategy for collecting frequency counts for call graph edges without instrumenting every procedure's code to count them. The data structures and algorithms used are efficient enough to construct the complete calling context tree exposed during sampling. The profiler leverages information recorded by compilers for debugging or exception handling to record call path profiles even for highly-optimized code. We describe an implementation for the Tru64/Alpha platform. Experiments profiling the SPEC CPU2000 benchmark suite demonstrate the low (2%-7%) overhead of this profiler. A comparison with instrumentation-based profilers, such as gprof, shows that for call-intensive programs, our sampling-based strategy for call path profiling has over an order of magnitude lower overhead.

[1]  Barton P. Miller,et al.  Incremental call‐path profiling , 2007, Concurr. Comput. Pract. Exp..

[2]  Georg Sander,et al.  Graph Layout through the VCG Tool , 1994, GD.

[3]  Steven J. Drew,et al.  Implementing Zero Overhead Exception Handling , 1995 .

[4]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[5]  Oscar Waddell,et al.  Visualizing the performance of higher-order programs , 1998, PASTE '98.

[6]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[7]  Robert J. Hall,et al.  Call Path Profiling of Monotonic Program Resources in UNIX , 1993, USENIX Summer.

[8]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[9]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[10]  Jong-Deok Choi,et al.  Finding and Removing Performance Bottlenecks in Large Systems , 2004, ECOOP.

[11]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[12]  John Whaley,et al.  A portable sampling-based profiler for Java virtual machines , 2000, JAVA '00.

[13]  J. Michael Spivey,et al.  Fast, accurate call graph profiling , 2004, Softw. Pract. Exp..

[14]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[15]  J. Wiley PRACTICAL EXPERIENCE OF THE LIMITATIONS OF GPROF , 1993 .

[16]  Robert J. Hall,et al.  Call Path Refinement Profiles , 1995, IEEE Trans. Software Eng..

[17]  Dominic A. Varley,et al.  Practical experience of the limitations of gprof , 1993, Softw. Pract. Exp..

[18]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[19]  Carl Ponder,et al.  Inaccuracies in program profilers , 1988, Softw. Pract. Exp..

[20]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.