Shade: a fast instruction-set simulator for execution profiling

Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program. The user may control the extent of tracing in a variety of ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.

[1]  Melvin E. Conway,et al.  Proposal for an UNCOL , 1958, CACM.

[2]  F. Storlie Henry , 1978, The American journal of nursing.

[3]  H. J. Saal,et al.  A software high performance APL interpreter , 1979, APL '79.

[4]  Ronald L. Johnston The Dynamic Incremental Compiler of APL\3000 , 1979, APL '79.

[5]  Garth D. Stahl Campbell. , 2021, Tic.

[6]  Richard M. Fujimoto SIMON: a Simulator of Multicomputer Networks , 1983 .

[7]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[8]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[9]  Fred C. Chow,et al.  Engineering a RISC Compiler System , 1986, COMPCON.

[10]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[11]  William G. Griswold,et al.  Enhancement through extension: the extension interpreter , 1987, PLDI 1987.

[12]  C. May Mimic: a fast system/370 simulator , 1987, PLDI 1987.

[13]  Thomas Pittman Two-level hybrid interpreter/native code execution for combined space-time program efficiency , 1987, PLDI.

[14]  Gerry Kane,et al.  MIPS R2000 RISC architecture , 1987 .

[15]  William G. Griswold,et al.  Extension and software development , 1988, Proceedings. [1989] 11th International Conference on Software Engineering.

[16]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[17]  Richard M. Fujimoto,et al.  Efficient instruction level simulation of computers , 1988 .

[18]  J. Robert Jump,et al.  The rice parallel processing testbed , 1988, SIGMETRICS '88.

[19]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[20]  W. Kent Fuchs,et al.  TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.

[21]  Craig Chambers,et al.  An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes , 1989, OOPSLA '89.

[22]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[23]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[24]  R. Bedicheck Some efficient architecture simulation tech-niques , 1990 .

[25]  David V. James,et al.  Multiplexed buses: the endian wars continue , 1990, IEEE Micro.

[26]  Susan J. Eggers,et al.  Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[27]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[28]  Craig Chambers,et al.  Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches , 1991, ECOOP.

[29]  W. Kent Fuchs,et al.  Address tracing for parallel machines , 1991, Computer.

[30]  David Keppel,et al.  A portable interface for on-the-fly instruction space modification , 1991, ASPLOS IV.

[31]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[32]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[33]  J. C. Heudin,et al.  RISC Architectures , 1992 .

[34]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[35]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[36]  K.M. Dixit New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.

[37]  W. Kent Fuchs,et al.  Address tracing of parallel systems via TRAPEDS , 1992, Microprocess. Microsystems.

[38]  Kristy Andrews,et al.  Migrating a CISC computer family onto RISC via object code translation , 1992, ASPLOS V.

[39]  J. Larus,et al.  Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocesors , 1992, International Conference on Architectural Support for Programming Languages and Operating Systems.

[40]  Rok Sosic,et al.  Dynascope: a tool for program directing , 1992, PLDI '92.

[41]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[42]  Jack E. Veenstra,et al.  Mint Tutorial and User Manual , 1993 .

[43]  Richard L. Sites,et al.  Binary translation , 1993, CACM.

[44]  Peter Magnusson Partial Translation , 1993 .

[45]  K. ReinhardtSteven,et al.  The Wisconsin Wind Tunnel , 1993 .

[46]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[47]  Peter S. Magnusson A Design for Efficient Simulation of a Multiprocessor , 1993, MASCOTS.

[48]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[49]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[50]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[51]  Susan J. Eggers,et al.  A case for runtime code generation , 1993 .

[52]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[53]  Kemal Ebcioglu,et al.  An architectural framework for supporting heterogeneous instruction-set architectures , 1993, Computer.

[54]  Robert F. Cmelik SpixTools: Introduction and User's Manual , 1993 .

[55]  Trevor N. Mudge,et al.  Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.

[56]  Bob Boothe,et al.  Fast accurate simulation of large shared memory multiprocessors , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[57]  Peter S. Magnusson,et al.  A Compact Intermediate Format for SimICS , 1994 .

[58]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.

[59]  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[60]  Peter S. Magnusson,et al.  Some Efficient Techniques for Simulating Memory , 1994 .

[61]  James R. Larus,et al.  Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[62]  Trevor N. Mudge,et al.  IDtrace/spl minus/a tracing tool for i486 simulation , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[63]  Peter Davies,et al.  Mable: A Technique for Efficient Machine Simulation , 1994 .

[64]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[65]  Urs Hölzle,et al.  Optimizing dynamically-dispatched calls with run-time type feedback , 1994, PLDI '94.

[66]  Norman Ramsey,et al.  The New Jersey Machine-Code Toolkit , 1995, USENIX.

[67]  Robert C. Bedichek,et al.  The Meerkat multicomputer: tradeoffs in multicomputer architecture , 1995 .

[68]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[69]  Sun Fire V20z Sun Microsystems , 1996 .