Facilitating level three cache studies using set sampling

We discuss some of the difficulties present in trace collection and trace-driven cache simulation. We then describe our multiprocessor tracing technique and verify that it accurately collects long traces. We propose sampling as a method to reduce required disk space, enable simulations to run faster, and effectively enlarge the trace buffer of our hardware monitor, decreasing trace distortion. To this end, we investigate time sampling and two types of set sampling. We conclude that the second set sampling technique achieves the most accurate results. The miss rate for the second set sampling method is calculated as the number of misses to sampled sets divided by the total number of references scaled by the sample size. We determined that a 10% sample size was the most accurate while still reducing required disk space.

[1]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991 .

[2]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[3]  Janak H. Patel,et al.  Trace driven simulation using sampled traces , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[4]  J. Kelly Flanagan A new methodology for accurate trace collection and its application to memory hierarchy performance modeling , 1993 .

[5]  J. Hennessy,et al.  Characteristics of performance-optimal multi-level cache hierarchies , 1989, ISCA '89.

[6]  James R. Larus,et al.  Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[7]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[8]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[9]  Richard E. Kessler,et al.  Generation and analysis of very long address traces , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[10]  Margaret Martonosi,et al.  Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.

[11]  Douglas W. Clark,et al.  Cache Performance in the VAX-11/780 , 1983, TOCS.

[12]  Trevor Mudge,et al.  Design tradeoffs for software-managed TLBs , 1993, ISCA '93.

[13]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[14]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[15]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.