Automatic blocking of QR and LU factorizations for locality
暂无分享,去创建一个
Ken Kennedy | Jack J. Dongarra | Qing Yi | Haihang You | Keith Seymour | J. Dongarra | K. Kennedy | Qing Yi | Haihang You | Keith Seymour
[1] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[2] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.
[3] Ken Kennedy,et al. Typed Fusion with Applications to Parallel and Sequential Code Generation , 1994 .
[4] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[5] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[6] Larry Carter,et al. Quantifying the Multi-level Nature of Tiling Interactions , 1997, LCPC.
[7] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[8] F. Gustavson,et al. Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .
[9] Keshav Pingali,et al. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests , 2000 .
[10] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[11] William Pugh,et al. Iteration Space Slicing for Locality , 1999, LCPC.
[12] William Pugh,et al. The Omega Library interface guide , 1995 .
[13] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[14] William Pugh,et al. Uniform techniques for loop optimization , 1991, ICS '91.
[15] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[16] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[17] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[18] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[19] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[20] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[21] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[22] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.