Trace-driven memory simulation: a survey

As the gap between processor and memory speeds continues to widen, methods for evaluating memory system designs before they are implemented in hardware are becoming increasingly important. One such method, trace-driven memory simulation, has been the subject of intense interest among researchers and has, as a result, enjoyed rapid development and substantial improvements during the past decade. This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks.

[1]  David W. Wall,et al.  Long Address Traces from RISC Machines: Generation and Analysis , 1999, ISCA 1989.

[2]  J. Emer,et al.  A characterization of processor performance in the VAX-11/780 , 1998, ISCA '98.

[3]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[4]  Trevor Mudge,et al.  Architectural trade-offs in a latency-tolerant gallium arsenide microprocessor , 1997 .

[5]  Alec Wolman,et al.  The structure and performance of interpreters , 1996, ASPLOS VII.

[6]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[7]  S.K. Reinhardt,et al.  Decoupled Hardware Support for Distributed Shared Memory , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[9]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[10]  Laxmi N. Bhuyan,et al.  High-performance computer architecture , 1995, Future Gener. Comput. Syst..

[11]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[12]  Trevor N. Mudge,et al.  Instruction fetching: Coping with code bloat , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[13]  David A. Wood,et al.  Active memory: a new abstraction for memory-system simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[14]  Robert C. Bedichek Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[15]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[16]  Robert C. Bedichek,et al.  The Meerkat multicomputer: tradeoffs in multicomputer architecture , 1995 .

[17]  Trevor Mudge,et al.  Instrumentation Tools , 1995 .

[18]  Geoffrey C. Fox,et al.  Cluster Computing Review , 1995 .

[19]  Andrew R. Pleszkun Techniques for compressing program address traces , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Trevor N. Mudge,et al.  Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.

[21]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[22]  Mark D. Hill,et al.  Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.

[23]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[24]  Peter Davies,et al.  Mable: A Technique for Efficient Machine Simulation , 1994 .

[25]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[26]  Trevor N. Mudge,et al.  Optimal allocation of on-chip memory for multiple-API operating systems , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[27]  Zarka Cvetanovic,et al.  Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[28]  Trevor N. Mudge,et al.  IDtrace/spl minus/a tracing tool for i486 simulation , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[29]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[30]  J. Bradley Chen,et al.  Memory Behavior for an X11 Window System , 1994, USENIX Winter.

[31]  Margaret Martonosi,et al.  Analyzing and tuning memory performance in sequential and parallel programs , 1994 .

[32]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[33]  Brian N. Bershad,et al.  The impact of operating system structure on memory system performance , 1994, SOSP '93.

[34]  J. Bradley Chen,et al.  Software methods for system address tracing , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[35]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[36]  Arvin Park,et al.  An analysis of the information content of address and data reference streams , 1993, SIGMETRICS '93.

[37]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[38]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[39]  Margaret Martonosi,et al.  Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.

[40]  David R. Kaeli,et al.  Issues in Trace-Driven Simulation , 1993, Performance/SIGMETRICS Tutorials.

[41]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[42]  Richard L. Sites,et al.  Binary translation , 1993, CACM.

[43]  Peter S. Magnusson A Design for Efficient Simulation of a Multiprocessor , 1993, MASCOTS.

[44]  Rabin A. Sugumar,et al.  Multi-configuration simulation algorithms for the evaluation of computer architecture designs , 1993 .

[45]  Josep Torrellas,et al.  Characterizing the caching and synchronization performance of a multiprocessor operating system , 1992, ASPLOS V.

[46]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[47]  Anura P. Jayasumana,et al.  Performance of a RISC machine with two level caches , 1992 .

[48]  Corporate Intel Corp. i860 microprocessor family programmer's reference manual , 1992 .

[49]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[50]  Trevor Mudge,et al.  Monster : a tool for analyzing the interaction between operating systems and computer architectures , 1992 .

[51]  James Archibald,et al.  BACH: BYU Address Collection Hardware, The Collection of Complete Traces , 1992 .

[52]  Carla Schlatter Ellis,et al.  Accuracy of Memory Reference Traces of Parallel Computations in Trace-Driven Simulation , 1992, IEEE Trans. Parallel Distributed Syst..

[53]  Wen-Hann Wang,et al.  Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.

[54]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[55]  John Paul Shen,et al.  Instruction level profiling and evaluation of the IBM RS/6000 , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[56]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[57]  Craig B. Stunkel,et al.  Collecting address traces from parallel computers , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[58]  Mark A. Holliday Techniques for Cache and Memory Simulation Using Address Reference Traces , 1991, Int. J. Comput. Simul..

[59]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[60]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[61]  David W. Wall,et al.  Systems for Late Code Modification , 1991, Code Generation.

[62]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[63]  Harold Stuart Stone High-performance computer architecture (2nd ed.) , 1990 .

[64]  Richard E. Kessler,et al.  Generation and analysis of very long address traces , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[65]  Anant Agarwal,et al.  Blocking: exploiting spatial locality for trace compaction , 1990, SIGMETRICS '90.

[66]  J. Baer,et al.  Efficient trace-driven simulation method for cache performance analysis , 1990, SIGMETRICS '90.

[67]  Susan J. Eggers,et al.  Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[68]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[69]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[70]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[71]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.

[72]  W. Kent Fuchs,et al.  TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.

[73]  Anant Agarwal,et al.  Analysis of cache performance for operating systems and multiprogramming , 1989, The Kluwer international series in engineering and computer science.

[74]  Trevor Mudge,et al.  Bus and cache memory organizations for multiprocessors , 1989 .

[75]  Alan Jay Smith,et al.  Efficient (stack) algorithms for analysis of write-back and sector memories , 1989, TOCS.

[76]  V. Rich Personal communication , 1989, Nature.

[77]  David W. Wall,et al.  Link-Time Code Modification , 1989 .

[78]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[79]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[80]  Douglas W. Clark,et al.  Measuring VAX 8800 performance with a histogram hardware monitor , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[81]  R. L. Sites,et al.  Multiprocessor cache analysis using ATUM , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[82]  J. Robert Jump,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[83]  Frank Lacy An Address Trace Generator for Trace-Driven Simulation of Shared , 1988 .

[84]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[85]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[86]  Cedell Alexander,et al.  Cache memory performance in a unix enviroment , 1986, CARN.

[87]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[88]  Alan Jay Smith,et al.  Bibliography and reading on CPU cache memories and related topics , 1986, CARN.

[89]  Faye Briggs,et al.  Translation buffer performance in a UNIX enviroment , 1985, CARN.

[90]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[91]  Herbert D. Schwetman,et al.  Proceedings of the 1984 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Cambridge, Massachusetts, USA, August 21-24, 1984 , 1984, SIGMETRICS.

[92]  Cheryl A. Wiecek,et al.  A case study of VAX-11 instruction set usage for compiler execution , 1982, ASPLOS I.

[93]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[94]  Edward S. Davidson,et al.  Information content of CPU memory referencing behavior , 1977, ISCA '77.

[95]  Alan Jay Smith,et al.  Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.

[96]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[97]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[98]  Maurice V. Wilkes The Growth of Interest in Microprogramming: A Literature Survey , 1969, CSUR.

[99]  J. L. Hodges,et al.  Basic Concepts of Probability and Statistics , 1964 .