Efficient simulation of caches under optimal replacement with applications to miss characterization

Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.

[1]  Frank Olken,et al.  Efficient methods for calculating the success function of fixed space replacement policies , 1983, Perform. Evaluation.

[2]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[3]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[6]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[7]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[8]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[9]  Vincent J. Kruskal,et al.  LRU Stack Processing , 1975, IBM J. Res. Dev..

[10]  Thomas M. Conte,et al.  The Susceptibility of Programs to Context Switching , 1994, IEEE Trans. Computers.

[11]  S. McFarling Program optimization for instruction caches , 1989, ASPLOS 1989.

[12]  S. Abraham,et al.  Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .

[13]  David A. Wood,et al.  Implementing stack simulation for highly-associative memories , 1991, SIGMETRICS '91.

[14]  Scott Arthur McFarling Program analysis and optimization for machines with instruction cache , 1992 .

[15]  Alan Jay Smith,et al.  Efficient Analysis of Caching Systems , 1987 .

[16]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[17]  Laxmi N. Bhuyan,et al.  High-performance computer architecture , 1995, Future Gener. Comput. Syst..

[18]  Dominique Thiébaut,et al.  On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio , 1989, IEEE Trans. Computers.

[19]  Anant Agarwal,et al.  Analysis of cache performance for operating systems and multiprogramming , 1989, The Kluwer international series in engineering and computer science.

[20]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[21]  Ayman I. Kayssi,et al.  The design of a microsupercomputer , 1991, Computer.

[22]  Mark D. Hill,et al.  Aspects of Cache Memory and Instruction , 1987 .