Trap-driven simulation with Tapeworm II

Tapeworm II is a software-based simulation tool that evaluates the cache and TLB performance of multiple-task and operating system intensive workloads. Tapeworm resides in an OS kernel and causes a host machine's hardware to drive simulations with kernel traps instead of with address traces, as is conventionally done. This allows Tapeworm to quickly and accurately capture complete memory referencing behavior with a limited degradation in overall system performance. This paper compares trap-driven simulation, as implemented in Tapeworm, with the more common technique of trace-driven memory simulation with respect to speed, accuracy, portability and flexibility.

[1]  Trevor Mudge,et al.  Design tradeoffs for software-managed TLBs , 1993, ISCA '93.

[2]  Peter S. Magnusson A Design for Efficient Simulation of a Multiprocessor , 1993, MASCOTS.

[3]  Trevor N. Mudge,et al.  Design tradeoffs for software-managed TLBs , 1994, TOCS.

[4]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[5]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[6]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[7]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[8]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[9]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[10]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[11]  Rabin A. Sugumar,et al.  Multi-configuration simulation algorithms for the evaluation of computer architecture designs , 1993 .

[12]  Trevor Mudge,et al.  Monster : a tool for analyzing the interaction between operating systems and computer architectures , 1992 .

[13]  Josep Torrellas,et al.  Characterizing the caching and synchronization performance of a multiprocessor operating system , 1992, ASPLOS V.

[14]  Trevor N. Mudge,et al.  Optimal allocation of on-chip memory for multiple-API operating systems , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[15]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[16]  Faye Briggs,et al.  Translation buffer performance in a UNIX enviroment , 1985, CARN.

[17]  Trevor Mudge,et al.  Kernel-Based Memory Simulation. , 1994, SIGMETRICS 1994.

[18]  Susan J. Eggers,et al.  Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[19]  Mark A. Holliday Techniques for Cache and Memory Simulation Using Address Reference Traces , 1991, Int. J. Comput. Simul..

[20]  Margaret Martonosi,et al.  Effectiveness of trace sampling for performance debugging tools , 1993, SIGMETRICS '93.

[21]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[22]  Mark D. Hill,et al.  Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.

[23]  Brian N. Bershad,et al.  The impact of operating system structure on memory system performance , 1994, SOSP '93.

[24]  K. ReinhardtSteven,et al.  The Wisconsin Wind Tunnel , 1993 .

[25]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[26]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[27]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[28]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[29]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[30]  Richard E. Kessler,et al.  Generation and analysis of very long address traces , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[31]  J. Bradley Chen,et al.  Software methods for system address tracing , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[32]  Alan Jay Smith,et al.  Efficient (stack) algorithms for analysis of write-back and sector memories , 1989, TOCS.

[33]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[34]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[35]  Zarka Cvetanovic,et al.  Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[36]  Douglas W. Clark,et al.  Cache Performance in the VAX-11/780 , 1983, TOCS.

[37]  Fred Douglis,et al.  Beating the I/O bottleneck: a case for log-structured file systems , 1989, OPSR.

[38]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[39]  Ketan Mayer-Patel,et al.  Performance of a software MPEG video decoder , 1993, MULTIMEDIA '93.

[40]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.