Identifying potential parallelism via loop-centric profiling

The transition to multithreaded, multi-core designs places a greater responsibility on programmers and software for improving performance; thread-level parallelism (TLP) will be increasingly relied upon in addition to instruction-level parallelism (ILP) and increased clock frequency. Deciding where to try to parallelize code is difficult, especially for large, complex applications or those where the original developers have moved on. Outer loops are relatively easy targets for parallelization, but traditional profilers focus primarily on functions and hot inner loops. To aid in programmers' parallelization efforts, we introduce the concept of loop-centric profiling to provide a hierarchical view of how much time is spent in a loop and the loops nested within it.This paper introduces two techniques for loop profiling. First, we describe an instrumentation-based approach that gathers highly detailed and accurate information about loop behavior. Second, we present a sampling approach that achieves similar results with negligible overhead. The paper concludes with a case study evaluating the tool on several SPEC 2000 benchmarks.

[1]  Dirk Grunwald,et al.  LoopProf : Dynamic Techniques for Loop Detection and Profiling , 2022 .

[2]  David R. Kaeli,et al.  Characterization and Evaluation of Hardware Loop Unrolling , 2002 .

[3]  David Kaeli,et al.  Runtime predictability of loops , 2001 .

[4]  David R. Kaeli,et al.  Path-based Hardware Loop Prediction , 2022 .

[5]  Makoto Kobayashi Dynamic Characteristics of Loops , 1984, IEEE Transactions on Computers.

[6]  Tipp Moseley,et al.  Analysis of path profiling information generated with performance monitoring hardware , 2005, 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05).

[7]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[8]  Antonio González,et al.  Control speculation in multithreaded processors through dynamic loop detection , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[9]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  Emden R. Gansner,et al.  Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools , 2003 .

[11]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[12]  Brad Calder,et al.  Loop Termination Prediction , 2000, ISHPC.

[13]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.