Increasing Temporal Locality with Skewing and Recursive Blocking
暂无分享,去创建一个
[1] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[2] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[3] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[4] Robert D. Falgout,et al. Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..
[5] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[6] Zhiyuan Li,et al. A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.
[7] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[8] John D. McCalpin,et al. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .
[9] Vikram S. Adve,et al. High Performance Fortran Compilation Techniques for Parallelizing Scientific Codes , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[10] Vikram S. Adve,et al. Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.
[11] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[12] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[13] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[14] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[15] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[16] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[17] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[18] Ken Kennedy,et al. Interprocedural transformations for parallel code generation , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[19] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[20] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.