Using Dynamic Tracing Sampling to Measure Long Running Programs

Detailed cache simulation can be useful to both system developers and application writers to understand an application’s performance. However, measuring long running programs can be extremely slow. In this paper we present a technique to use dynamic sampling of trace snippets throughout an application’s execution. We demonstrate that our approach improves accuracy compared to sampling a few timesteps at the beginning of execution by judiciously choosing the frequency, as well as the points in the control flow, at which samples are collected. Our approach is validated using the SIGMA tracing and simulation framework for the IBM Power family of processors.

[1]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Josep Torrellas,et al.  The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[3]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[4]  Laura Carrington,et al.  A Framework for Application Performance Modeling and Prediction , 2002 .

[5]  K. Ekanadham,et al.  baraglia pSigma : An Infrastructure for Parallel Application Performance Analysis using Symbolic Specifications , 2006 .

[6]  Simone Sbaraglia,et al.  An Approach for Symbolic Mapping of Memory References , 2004, Euro-Par.

[7]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[8]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[9]  Marvin Theimer,et al.  Tango Lite: a Multiprocessor Simulation Environment. Unpublished Intro- Duction and User's Guide, Figure 4: Low Communication/computation Ratio for 16 Virtual Processors Figure 3: Medium Communication/computation Ratio for 16 Virtual Processors Figure 2: High Communication/computation Ratio Using 16 , 2008 .

[10]  Jesús Labarta,et al.  Performance Modeling of HPC Applications , 2003, PARCO.

[11]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[13]  Jeffrey K. Hollingsworth,et al.  Data Centric Cache Measurement on the Intel ltanium 2 Processor , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[14]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[15]  Margaret Martonosi,et al.  Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.