Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes

In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (1) developing a high level program to measure the design and performance of the cache and TLB units; (2) using those measurements, along with published miss ratio data, to improve the accuracy of our runtime predictions; (3) using our analysis tools and measurements to study and compare the design of several machines, with particular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented, in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to computer users and computer system designers. >

[1]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[2]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the integer SPEC benchmarks on a RISC , 1990, CARN.

[3]  Alan Jay Smith,et al.  Performance Characterization of Optimizing Compilers , 1992, IEEE Trans. Software Eng..

[4]  Wm. A. Wulf The WM computer architecture , 1988, CARN.

[5]  Leonard J. Shustek,et al.  An instruction timing model of CPU performance , 1977, ISCA '77.

[6]  Alan Jay Smith,et al.  Machine Characterization Based on an Abstract High-Level Language Machine , 1989, IEEE Trans. Computers.

[7]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[8]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.

[9]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[10]  James E. Smith,et al.  Decoupled access/execute computer architectures , 1984, TOCS.

[11]  James E. Smith Decoupled access/execute architectures , 1982, ISCA 1982.

[12]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[13]  Andrew R. Pleszkun,et al.  PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[14]  Alan Jay Smith,et al.  Analysis of Benchmark Characteristics and Benchmark Performance , 1992 .

[15]  Seth Copen Goldstein,et al.  Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993, ISCA '93.