A Parallel Trace-driven Simulator: Implementation and Performance

The simulation of parallel architectures requires an enormous amount of CPU cycles and, in the case of trace-driven simulation, of disk storage. In this paper, we consider the evaluation of the memory hierarchy of multiprocessor systems via parallel trace-driven simulation. We refine Lin et al.[8] original algorithm, whose main characteristic is to insert the shared references from every trace in all other traces, by reducing the amount of communication between simulation processes. We have implemented our algorithm on a KSR-1. Results of our experiments on traces of four applications and three different cache coherence protocols show that parallel trace-driven simulation yields significant speedups over its sequential counter-part. The communication overhead is not substantial compared to the dominant overhead due to the processing of replicated inserted references. We also investigate filtering techniques and show how to filter in parallel private and shared references for various block sizes in one pass. Simulation of filtered traces is faster but with a lower speedup.

[1]  Rassul Ayani,et al.  Parallel Cache Simulation on Multiprocessor Workstattions , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[2]  Susan J. Eggers,et al.  Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[3]  Alan Jay Smith,et al.  Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.

[4]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[5]  Eugene D. Brooks,et al.  The Cerberus Multiprocessor Simulator , 1987, PPSC.

[6]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[7]  WangWen-Hann,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991 .

[8]  Susan J. Eggers,et al.  On the validity of trace-driven simulation for multiprocessors , 1991, ISCA '91.

[9]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[10]  Wen-Hann Wang,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.

[11]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[12]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[13]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[14]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[15]  J. F. Wcnh 1993 International Conference on Parallel Processing , 1993 .