Understanding the Performance of DSM Applications

Carnival is a performance measurement and analysis tool that assists users in understanding the performance of DSM applications and protocols. Using traces of program executions, Carnival presents performance data as a hierarchy of execution profiles. During analysis, Carnival automates the inference process that relates performance phenomena to specific causes in the source code or DSM protocol using techniques that focus on the two most important sources of overhead in DSM systems: waiting time analysis identifies the causes of synchronization overhead, and produces an explanation for each source of waiting time in the program; communication analysis identifies the sequence of requests that result in invalidations, and produces an explanation for each source of communication. We describe these techniques and their implementation in TreadMarks, and show how to use waiting time analysis and communication analysis to improve the running time of two programs from the SPLASH application suite when executed on DEC Alphas connected by a DEC Memory Channel network.

[1]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[2]  Richard B. Gillett Memory Channel Network for PCI , 1996, IEEE Micro.

[3]  Michael L. Scott,et al.  High Performance Software Coherence for Current and Future Architectures , 1995, J. Parallel Distributed Comput..

[4]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[5]  Liviu Iftode,et al.  Improving release-consistent shared virtual memory using automatic update , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[6]  Wagner Meira,et al.  Waiting time analysis and performance visualization in Carnival , 1996, SPDT '96.

[7]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[8]  John L. Hennessy,et al.  Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  Ricardo Bianchini,et al.  Hiding communication latency and coherence overhead in software DSMs , 1996, ASPLOS VII.

[10]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[11]  James R. Larus,et al.  StormWatch: a tool for visualizing memory system protocols , 1995 .

[12]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[13]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[14]  Alan L. Cox,et al.  A performance debugger for eliminating excess synchronization in shared-memory parallel programs , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Kai Li,et al.  Virtual-Memory-Mapped Network Interfaces , 1995, IEEE Micro.