SIGMA: A Simulator Infrastructure to Guide Memory Analysis

In this paper we present SIGMA (Simulation Infrastructure to Guide Memory Analysis), a new data collection framework and family of cache analysis tools. The SIGMA environment provides detailed cache information by gathering memory reference data using software-based instrumentation. This infrastructure can facilitate quick probing into the factors that influence the performance of an application by highlighting bottleneck scenarios including: excessive cache/TLB misses and inefficient data layouts. The tool can also assist in perturbation analysis to determine performance variations caused by changes to architecture or program. Our validation tests using the SPEC Swim benchmark show that most of the performance metrics obtained with SIGMA are within 1% of the metrics obtained with hardware performance counters, with the advantage that SIGMA provides performance data on a data structure level, as specified by the programmer.

[1]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Josep Torrellas,et al.  The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[3]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[4]  Bernd Mohr,et al.  TAU Tuning and Analysis Utilities for Portable Parallel Programming , 1995 .

[5]  Stephen Alan Herrod Tango Lite: A Multiprocessor Simulation Environment Introduction and User's Guide , 1993 .

[6]  Rudolf Berrendorf,et al.  PCL - The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors , 1998 .

[7]  Pankaj Mehra,et al.  Performance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit , 1995, Softw. Pract. Exp..

[8]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[9]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[11]  R. Sadourny The Dynamics of Finite-Difference Models of the Shallow-Water Equations , 1975 .

[12]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[13]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[14]  James R. Larus,et al.  StormWatch: a tool for visualizing memory system protocols , 1995 .

[15]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[16]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[17]  Jeffrey K. Hollingsworth,et al.  Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  John L. Hennessy,et al.  Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[20]  Luiz DeRose The Hardware Performance Monitor Toolkit , 2001 .

[21]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[22]  Margaret Martonosi,et al.  Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.