Techniques for Cache and Memory Simulation Using Address Reference Traces

Simulation using address reference traces is one of the primary methods for the performance evaluation of the memory hierarchy of computer systems. In this paper we survey the techniques used in such a simulation. In both the uniprocessor and shared-memory multiprocessor cases, the issues can be divided into trace collection, trace storage, and trace usage. Trace collection can employ several hardware or software methods. Common concerns are that the collection method capture all of the address references of interest, that the execution overhead of the collection method is not excessive, and that the trace is of adequate length. The increasing size of caches heightens the adequate length concern. Trace storage is of concern because of the large size of traces. Techniques for trace compression and trace reduction have been developed. Trace usage is of concern because of the length of a simulation. Under some circumstances it is possible to evaluate multiple cache sizes in a single pass of the trace. For multiprocessor traces it is also possible to simulate the trace in parallel to achieve speedup. In the multiprocessor case, the global trace problem arises because environment-dependent address changes prevent the adjustment of traces collected in one environment from re ecting a di erent environment. A relatively new technique, inline simulation, attempts to avoid a number of the problems associated with traditional trace-driven simulation. Index Terms address reference traces, trace-driven simulation, survey, inclusion property, trace reduction, one-pass simulation, parallel traces, global trace problem, inline simulation. 2

[1]  Ernst H. Kristiansen,et al.  Trace-driven simulations for a two-level cache design in open bus systems , 1990, ISCA '90.

[2]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[3]  W. Kent Fuchs,et al.  TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.

[4]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[5]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[6]  Carla Schlatter Ellis,et al.  Accuracy of Memory Reference Traces of Parallel Computations in Trace-Driven Simulation , 1992, IEEE Trans. Parallel Distributed Syst..

[7]  Harold S. Stone,et al.  Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.

[8]  Frank Lacy An Address Trace Generator for Trace-Driven Simulation of Shared , 1988 .

[9]  Carla Schlatter Ellis,et al.  An Example of Correct Global Trace Generation , 1992 .

[10]  Anant Agarwal,et al.  Blocking: exploiting spatial locality for trace compaction , 1990, SIGMETRICS '90.

[11]  Philip L. Vitale,et al.  Performance evaluation of a commercial cache-coherent shared memory multiprocessor , 1990, SIGMETRICS '90.

[12]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[13]  J. Zahorjan,et al.  An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[14]  Andrew W. Wilson,et al.  Hierarchical cache/bus architecture for shared memory multiprocessors , 1987, ISCA '87.

[15]  Jennifer M. Murphy,et al.  Characterising Program Behaviour with Phases and Transitions , 1988, SIGMETRICS.

[16]  Makoto Kobayashi,et al.  The Stack Growth Function: Cache Line Reference Models , 1989, IEEE Trans. Computers.

[17]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.

[18]  Alan Jay Smith,et al.  Efficient (stack) algorithms for analysis of write-back and sector memories , 1989, TOCS.

[19]  Philip L. Rosenfeld,et al.  Fractal Nature of Software-Cache Interaction , 1983, IBM J. Res. Dev..

[20]  Richard M. Fujimoto SIMON: a Simulator of Multicomputer Networks , 1983 .

[21]  Douglas W. Clark,et al.  Cache Performance in the VAX-11/780 , 1983, TOCS.

[22]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[23]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[24]  Ilkka J. Haikala ARMA models of program behaviour , 1986, SIGMETRICS '86/PERFORMANCE '86.

[25]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[26]  T. J. Bergendahl,et al.  DIGITAL EQUIPMENT CORPORATION. , 1968, Analytical chemistry.

[27]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[28]  Josep Torrellas,et al.  Analysis of Critical Architectural and Program Parameters in a Hierarchical Shared Memory Multiprocessor , 1990, SIGMETRICS.

[29]  Helen Davis,et al.  Tango introduction and tutorial , 1990 .

[30]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[31]  Giuseppe Serazzi,et al.  Measurement and Tuning of Computer Systems , 1984, Int. CMG Conference.

[32]  Dominique Thiébaut,et al.  On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio , 1989, IEEE Trans. Computers.

[33]  Michel Dubois,et al.  Trace-Driven Simulations of Parallel and Distributed Algorithms in Multiprocessors , 1986, International Conference on Parallel Processing.

[34]  William D. Strecker,et al.  Transient behavior of cache memories , 1983, TOCS.

[35]  Cedell Alexander,et al.  Cache memory performance in a unix enviroment , 1986, CARN.

[36]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[37]  K. M. Chandy,et al.  Conditional Knowledge as a Basis for Distributed Simulation , 1987 .

[38]  Jeffrey R. Spirn,et al.  Program Behavior: Models and Measurements , 1977 .

[39]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[40]  Alan Jay Smith Cache Evaluation and the Impact of Workload Choice , 1985, ISCA.

[41]  Alan Jay Smith,et al.  Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.

[42]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[43]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[44]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[45]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS III.

[46]  Wen-Hann Wang,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.

[47]  J. Robert Jump,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[48]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[49]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[50]  Wen-Hann Wang,et al.  On the Inclusion Properties for Multi-Level Cache Hierarchies , 1988, ISCA.

[51]  J. Hennessy,et al.  Characteristics of performance-optimal multi-level cache hierarchies , 1989, ISCA '89.

[52]  Steven A. Przybylski,et al.  Cache and memory hierarchy design: a performance-directed approach , 1990 .

[53]  Paul F. Dubois,et al.  A simulator for MIMD performance prediction: application to the S-1 MkIIa multiprocessor , 1983, Parallel Comput..

[54]  Alan Mink,et al.  Multiprocessor performance-measurement instrumentation , 1990, Computer.

[55]  Frank Bobrowicz,et al.  Speedup Predictions on Large Scientific Parallel Programs on Cray X MP Like Architectures , 1985, ICPP.