Data Prefetching for Non-Linear Memory References
暂无分享,去创建一个
[1] Chau-Wen Tseng,et al. Locality Optimizations for Multi-Level Caches , 1999, SC.
[2] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[3] M. Castells. Multilevel tiling for non-rectangular interation spaces , 1999 .
[4] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[5] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[6] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[7] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[8] K. Sundaramoorthy,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[9] Jeffrey F. Naughton,et al. Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.
[10] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[11] Theo Ungerer,et al. Identifying Bottlenecks in a Multithreaded Superscalar Microprocessor , 1996, Euro-Par, Vol. II.
[12] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[13] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[14] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[15] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[16] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[17] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[18] Jacqueline Chame,et al. A tile selection algorithm for data locality and cache interference , 1999, ICS '99.
[19] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[20] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .
[21] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[22] Tien-Fu Chen,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[23] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[24] Olivier Temam,et al. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93. Proceedings.
[25] Luddy Harrison. Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.
[26] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[27] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[28] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[29] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.
[30] Per Stenström,et al. Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..
[31] Graham R. Nudd,et al. Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.
[32] Tien-Fu Chen,et al. Data prefetching for high-performance processors , 1993 .
[33] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[34] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[35] Mahmut T. Kandemir,et al. A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..
[36] Todd C. Mowry,et al. Automatic Compiler-Inserted Prefetching for Pointer-Based Applications , 1999, IEEE Trans. Computers.
[37] D. Tullsen,et al. ILP versus TLP on SMT , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[38] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[39] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[40] Kozo Kimura,et al. An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.
[41] Karim Esseghir. Improving data locality for caches , 1993 .
[42] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[43] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[44] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[45] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[46] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[47] Jeremy D. Frens,et al. Morton-order Matrices Deserve Compilers ’ Support Technical Report 533 , 1999 .
[48] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[49] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[50] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[51] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[52] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[53] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[54] Mahmut T. Kandemir,et al. Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.