Techniques utilizing memory reference characteristics for improved performance
暂无分享,去创建一个
[1] Gary S. Tyson,et al. Utilizing reuse information in data cache management , 1998, ICS '98.
[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[4] Mikko H. Lipasti,et al. Value locality and load value prediction , 1996, ASPLOS VII.
[5] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.
[6] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[7] Jean-Loup Baer,et al. Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[8] Gary S. Tyson,et al. A modified approach to data cache management , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[9] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[10] Brad Calder,et al. Efficient procedure mapping using cache line coloring , 1997, PLDI '97.
[11] Per Stenström,et al. A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[12] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[13] Billy Garrett,et al. RDRAMs: a new speed paradigm , 1994, Proceedings of COMPCON '94.
[14] Alec Wolman,et al. The structure and performance of interpreters , 1996, ASPLOS VII.
[15] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[16] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.
[17] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[18] Gurindar S. Sohi,et al. Effective jump-pointer prefetching for linked data structures , 1999, ISCA.
[19] David J. Lilja,et al. A compiler-assisted data prefetch controller , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[20] Fu-Chieh Hsu,et al. The ideal SoC memory: 1T-SRAM/sup TM/ , 2000, Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541).
[21] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .
[22] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[23] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.
[24] Michael L. Scott,et al. Cache performance in vector supercomputers , 1994, Proceedings of Supercomputing '94.
[25] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[26] Yasunao Katayama,et al. A 22-ns 1-Mbit CMOS high-speed DRAM with address multiplexing , 1989 .
[27] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[28] Thomas Alexander,et al. Distributed prefetch-buffer/cache design for high performance memory systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[29] John H. Zurawski,et al. The Design and Verification of the AlphaStation 600 5-series Workstation , 1995, Digit. Tech. J..
[30] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[31] Wen-mei W. Hwu,et al. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis , 1997, ISCA.
[32] Dileep Bhandarkar,et al. Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[33] Brian N. Bershad,et al. Execution characteristics of desktop applications on Windows NT , 1998, ISCA.
[34] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[35] Brad Calder,et al. Predictor-directed stream buffers , 2000, MICRO 33.
[36] James K. Archibald,et al. Evaluating performance of prefetching second level caches , 1993, PERV.
[37] D. Burger,et al. Datascalar Architectures , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[38] Alvin R. Lebeck,et al. Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[39] Trevor N. Mudge,et al. A performance comparison of contemporary DRAM architectures , 1999, ISCA.
[40] Shlomit S. Pinter,et al. Tango: a hardware-based data prefetching technique for superscalar processors , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[41] Steven Przybylski. The performance impact of block sizes and fetch strategies , 1990, ISCA '90.
[42] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[43] Steven K. Reinhardt,et al. A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[44] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.
[45] Karl Pettis,et al. Profile guided code positioning , 1990, PLDI '90.
[46] Goro Kitsukawa,et al. A 23-ns 1-Mb BiCMOS DRAM , 1990 .
[47] Santosh G. Abraham,et al. Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.
[48] Katherine Yelick,et al. A Case for Intelligent RAM: IRAM , 1997 .
[49] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[50] Mark J. Charney,et al. Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..
[51] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[52] Michael F. Deering,et al. FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.
[53] Yasuhiro Konishi,et al. A 100-MHz 4-Mb cache DRAM with fast copy-back scheme , 1992 .
[54] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[55] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[56] James R. Goodman,et al. Instruction Cache Replacement Policies and Organizations , 1985, IEEE Transactions on Computers.
[57] P. Chow,et al. Memory-system Design Considerations For Dynamically-scheduled Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[58] James E. Smith,et al. Performance Of Cached Dram Organizations In Vector Supercomputers , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[59] Sharon E. Perl,et al. Studies of Windows NT performance using dynamic execution traces , 1996, OSDI '96.
[60] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[61] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[62] Craig B. Zilles. Benchmark health considered harmful , 2001, CARN.
[63] David W. Wall,et al. Generation and analysis of very long address traces , 1990, ISCA '90.
[64] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[65] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[66] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[67] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[68] S SohiGurindar. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .
[69] Charles A. Hart. CDRAM in a unified memory architecture , 1994, Proceedings of COMPCON '94.
[70] Anne Rogers,et al. Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.