Cooperative caching with keep-me and evict-me
暂无分享,去创建一个
[1] R. E. Kessler,et al. Inexpensive implementations of set-associativity , 1989, ISCA '89.
[2] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[3] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[4] Guang R. Gao,et al. Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation , 2003, LCPC.
[5] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[6] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[7] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[8] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[9] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[10] MartonosiMargaret,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998 .
[11] Yannis Smaragdakis,et al. EELRU: simple and effective adaptive page replacement , 1999, SIGMETRICS '99.
[12] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[13] Gary S. Tyson,et al. Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.
[14] Kathryn S. McKinley,et al. Cooperative hardware/software caching for next-generation memory systems , 2004 .
[15] Ken Kennedy,et al. An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..
[16] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[17] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[18] François Bodin,et al. Skewed associativity enhances performance predictability , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[19] Olivier Temam,et al. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.
[20] Jean-Loup Baer,et al. Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[21] Steven W. White,et al. POWER3: The next generation of PowerPC processors , 2000, IBM J. Res. Dev..
[22] Jih-Kwon Peir,et al. Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.
[23] Dileep Bhandarkar,et al. Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[24] Arnold L. Rosenberg,et al. Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[25] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[26] Steven K. Reinhardt,et al. A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[27] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[28] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[29] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.
[30] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[31] Olivier Temam,et al. An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels , 1999, IEEE Trans. Computers.
[32] A. Agarwal,et al. Column-associative Caches: A Technique For Reducing The Miss Rate Of Direct-mapped Caches , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[33] Wen-mei W. Hwu,et al. Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[34] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .
[35] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[36] T. N. Vijaykumar,et al. Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[37] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[38] Santosh G. Abraham,et al. Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.
[39] Sally A. McKee,et al. Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.
[40] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[41] Carole Dulong,et al. The IA-64 Architecture at Work , 1998, Computer.
[42] Gary S. Tyson,et al. Utilizing reuse information in data cache management , 1998, ICS '98.
[43] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[44] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[45] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.