Cache miss heuristics and preloading techniques for general-purpose programs
暂无分享,去创建一个
Mikko H. Lipasti | Steven R. Kunkel | William J. Schmidt | Robert Ralph Roediger | William Jon Schmidt | Robert R. Roediger | S. Kunkel | Toshihiro Ozawa | Yasunori Kimura | Shin'ichiro Nishizaki
[1] Donald J. Hatfield,et al. Program Restructuring for Virtual Memory , 1971, IBM Syst. J..
[2] Ken Kennedy,et al. Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.
[3] David Bernstein,et al. Compiler techniques for data prefetching on the PowerPC , 1995, PACT.
[4] Anant Agarwal,et al. On-Chip Instruction Caches for High Performance Processors, , 1987 .
[5] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..
[6] W. W. Hwu,et al. Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.
[7] Mikko H. Lipasti,et al. Architecture-compatible code boosting for performance enhancement of the IBM RS/6000 , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.
[8] Stephen J. Hartley. Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems , 1988, IEEE Trans. Software Eng..
[9] Youfeng Wu. Ordering functions for improving memory reference locality in a shared memory multiprocessor system , 1992, MICRO 1992.
[10] Santosh G. Abraham,et al. Iteration Partitioning for Resolving Stride Conflicts on Cache-Coherent Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[11] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[12] Scott McFarling,et al. Procedure merging with instruction caches , 1991, PLDI '91.
[13] Henry M. Levy,et al. An architecture for software-controlled data prefetching , 1991, ISCA '91.
[14] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .
[15] Randall R. Heisch. Trace-directed program restructuring for AIX executables , 1994, IBM J. Res. Dev..
[16] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[17] Apostolos Dollas,et al. Predicting and precluding problems with memory latency , 1994, IEEE Micro.
[18] Walid A. Najjar,et al. An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.
[19] Shlomit S. Pinter,et al. Compile time instruction cache optimizations , 1994, CARN.
[20] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[21] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[22] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[23] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.
[24] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[25] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[26] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[27] Anne Rogers,et al. Software support for speculative loads , 1992, ASPLOS V.
[28] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[29] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[30] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[31] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.
[32] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.
[33] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[34] Michel Dubois,et al. Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.
[35] Dirk Grunwald,et al. Reducing branch costs via branch alignment , 1994, ASPLOS VI.
[36] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[37] Rajiv Gupta,et al. Improving instruction cache behavior by reducing cache pollution , 1990, Proceedings SUPERCOMPUTING '90.
[38] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[39] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[40] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[41] Rajiv Gupta,et al. Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[42] Robert P. Colwell,et al. A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.
[43] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.
[44] Domenico Ferrari,et al. Improving locality by critical working sets , 1974, CACM.
[45] Brian N. Bershad,et al. The impact of operating system structure on memory system performance , 1994, SOSP '93.
[46] M. K. Farrens,et al. Improving performance of small on-chip instruction caches , 1989, ISCA '89.
[47] Steven A. Przybylski,et al. Cache and memory hierarchy design: a performance-directed approach , 1990 .
[48] Scott A. Mahlke,et al. Tolerating First Level Memory Access Latency in High-Performance Systems , 1992, ICPP.
[49] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[50] Anoop Gupta,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.