DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving
暂无分享,去创建一个
Zhao Zhang | Hongzhong Zheng | Jiang Lin | Zhichun Zhu | Howard David | Hongzhong Zheng | Jiang Lin | Zhao Zhang | Howard David | Zhichun Zhu
[1] David J. Lilja,et al. Data prefetch mechanisms , 2000, CSUR.
[2] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[3] James E. Smith,et al. Performance Of Cached Dram Organizations In Vector Supercomputers , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[4] Aamer Jaleel,et al. Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[5] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[6] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[7] Zhao Zhang,et al. Cached DRAM for ILP Processor Memory Access Latency Reduction , 2001, IEEE Micro.
[8] Robert Cypher,et al. Trends and trade-offs in designing highly robust throughput computing oriented chips and systems , 2005, 11th IEEE International On-Line Testing Symposium.
[9] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[10] Jon Haas,et al. Fully-Buffered DIMM Technology Moves Enterprise Platforms to the Next Level , 2005 .
[11] Nathan L. Binkert,et al. Network-Oriented Full-System Simulation using M5 , 2003 .
[12] Zhao Zhang,et al. A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[13] Gershon Kedem,et al. WCDRAM: A fully associative integrated Cached-DRAM with wide cache lines , 1997 .
[14] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[15] T. Sherwood,et al. Predictor-directed stream buffers , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.
[16] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[17] Rami Marwan Nasr,et al. FBsim and the Fully Buffered DIMM Memory System Architecture , 2005 .
[18] Hideto Hidaka,et al. The cache DRAM architecture: a DRAM with an on-chip cache memory , 1990, IEEE Micro.
[19] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.
[20] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[21] James R. Goodman,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[22] Trevor N. Mudge,et al. A performance comparison of contemporary DRAM architectures , 1999, ISCA.
[23] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.
[24] Charles A. Hart. CDRAM in a unified memory architecture , 1994, Proceedings of COMPCON '94.