Using the compiler to improve cache replacement decisions
暂无分享,去创建一个
Arnold L. Rosenberg | Kathryn S. McKinley | Charles C. Weems | Zhenlin Wang | K. McKinley | A. Rosenberg | C. Weems | Zhenlin Wang
[1] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3] Yannis Smaragdakis,et al. EELRU: simple and effective adaptive page replacement , 1999, SIGMETRICS '99.
[4] Gary S. Tyson,et al. Utilizing reuse information in data cache management , 1998, ICS '98.
[5] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[6] Sarita V. Adve,et al. RSIM Reference Manual: Version 1.0 , 1997 .
[7] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[8] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.
[9] Olivier Temam,et al. An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels , 1999, IEEE Trans. Computers.
[10] A. L. Rosenberg,et al. Improving Replacement Decisions in Set-Associative Caches TITLE2: , 2001 .
[11] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[12] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[13] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[14] Olivier Temam,et al. Load Scheduling with Profile Information , 2000, Euro-Par.
[15] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[16] Olivier Temam,et al. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.
[17] Santosh G. Abraham,et al. Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.
[18] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[19] Carole Dulong,et al. The IA-64 Architecture at Work , 1998, Computer.
[20] Wen-mei W. Hwu,et al. Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[21] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .
[22] Sally A. McKee,et al. Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.
[23] Lixin Zhang. URSIM Reference Manual , 1999 .
[24] Arnold L. Rosenberg,et al. Improving Replacement Decisions in Set-Associative Caches , 2001 .
[25] Ken Kennedy,et al. An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..
[26] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[27] Jean-Loup Baer,et al. Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[28] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[29] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[30] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[31] Mark D. Hill,et al. A case for direct-mapped caches , 1988, Computer.
[32] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[33] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[34] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.