Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion
暂无分享,去创建一个
Ken Kennedy | Qing Yi | K. Kennedy | Qing Yi
[1] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[2] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[3] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[4] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[5] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[6] Alain Darte. On the Complexity of Loop Fusion , 2000, Parallel Comput..
[7] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[8] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[9] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[10] Ken Kennedy,et al. Typed Fusion with Applications to Parallel and Sequential Code Generation , 1994 .
[11] William Pugh,et al. Transitive Closure of Infinite Graphs and its Applications , 1995, Int. J. Parallel Program..
[12] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[13] William Pugh,et al. The Omega Library interface guide , 1995 .
[14] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[15] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[16] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[17] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[18] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[19] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[20] F. Gustavson,et al. Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .
[21] William Pugh,et al. Uniform techniques for loop optimization , 1991, ICS '91.
[22] Gene H. Golub,et al. Matrix computations , 1983 .
[23] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[24] William W. Pugh,et al. Fine-grained analysis of array computations , 1998 .
[25] William Pugh,et al. Iteration Space Slicing for Locality , 1999, LCPC.
[26] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[27] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[28] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.