Active memory: a new abstraction for memory-system simulation

This paper describes the active memory abstraction for memory-system simulation. In this abstraction---designed specifically for on-the-fly simulation, memory references logically invoke a user-specified function depending upon the reference's type and accessed memory block state. Active memory allows simulator writers to specify the appropriate action on each reference, including "no action" for the common case of cache hits. Because the abstraction hides implementation details, implementations can be carefully tuned for particular platforms, permitting much more efficient on-the-fly simulation than the traditional trace-driven abstraction.Our SPARC implementation, Fast-Cache, executes simple data cache simulations two or three times faster than a highly-tuned trace-driven simulator and only 2 to 7 times slower than the original program. Fast-Cache implements active memory by performing a fast table look up of the memory block state, taking as few as 3 cycles on a SuperSPARC for the no-action case. Modeling the effects of Fast-Cache's additional lookup instructions qualitatively shows that Fast-Cache is likely to be the most efficient simulator for miss ratios between 3% and 40%.

[1]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[2]  Ken Kennedy,et al.  Software methods for improvement of cache performance on supercomputer applications , 1989 .

[3]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[4]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[5]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[6]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[7]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[8]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[9]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[10]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.

[11]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[12]  Norman P. Jouppi,et al.  Tradeoffs in two-level on-chip caching , 1994, ISCA '94.

[13]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[14]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[15]  Trevor Mudge,et al.  Design tradeoffs for software-managed TLBs , 1993, ISCA '93.

[16]  Trevor N. Mudge,et al.  Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.

[17]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[18]  T. Anderson,et al.  Eecient Software-based Fault Isolation , 1993 .

[19]  Babak Falsafi,et al.  Kernel Support for the Wisconsin Wind Tunnel , 1993, USENIX Microkernels and Other Kernel Architectures Symposium.

[20]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[21]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[22]  Robert C. Bedichek Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[23]  S. Abraham,et al.  Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .

[24]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[25]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[26]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[27]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[28]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[29]  David A. Wood,et al.  Active Memory: A New Abstraction for Memory System Simulation , 1997, ACM Trans. Model. Comput. Simul..

[30]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[31]  Margaret Martonosi,et al.  Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.

[32]  David B. Whalley,et al.  Fast instruction cache performance evaluation using compile-time analysis , 1992, SIGMETRICS '92/PERFORMANCE '92.

[33]  W. Kent Fuchs,et al.  TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.

[34]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[35]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[36]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[37]  Alan Jay Smith,et al.  Two Methods for the Efficient Analysis of Memory Address Trace Data , 1977, IEEE Transactions on Software Engineering.