Automatic tiling of iterative stencil loops
暂无分享,去创建一个
[1] Leland L. Beck,et al. Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.
[2] Viktor K. Prasanna,et al. Analysis of memory hierarchy performance of block data layout , 2002, Proceedings International Conference on Parallel Processing.
[3] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[4] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[5] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[6] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[7] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[8] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[9] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .
[10] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[11] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[12] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[13] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[14] Adams. Multigrid software for elliptic partial differential equations: MUDPACK. Technical note , 1991 .
[15] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[16] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[17] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[18] Fred G. Gustavson,et al. Recursive Formulation of Some Dense Linear Algebra Algorithms , 1999, PPSC.
[19] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[20] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[21] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[22] Keshav Pingali,et al. Transformations for Imperfectly Nested Loops , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[23] Zhiyuan Li,et al. Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.
[24] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.
[25] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[26] William Pugh,et al. Iteration Space Slicing for Locality , 1999, LCPC.
[27] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[28] David G. Wonnacott,et al. Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.
[29] Jacqueline Chame,et al. A tile selection algorithm for data locality and cache interference , 1999, ICS '99.
[30] Larry Carter,et al. Schedule-independent storage mapping for loops , 1998, ASPLOS VIII.
[31] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[32] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[33] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[34] Vicki H. Allan,et al. Software pipelining , 1995, CSUR.
[35] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[36] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global cache reuse , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[37] William Pugh,et al. Exploiting Monotone Convergence Functions in Parallel Programs , 1996, LCPC.
[38] Rudolf Eigenmann,et al. Nonlinear and Symbolic Data Dependence Testing , 1998, IEEE Trans. Parallel Distributed Syst..
[39] Yves Robert,et al. Static tiling for heterogeneous computing platforms , 1999, Parallel Comput..
[40] LiZhiyuan,et al. Automatic tiling of iterative stencil loops , 2004 .
[41] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[42] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[43] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[44] Zhiyuan Li,et al. Interprocedural Analysis for Loop Scheduling and Data Allocation , 1998, Parallel Comput..
[45] J. Ramanujam,et al. Loop optimization for a class of memory-constrained computations , 2001, ICS '01.
[46] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[47] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.
[48] Guohua Jin,et al. Increasing Temporal Locality with Skewing and Recursive Blocking , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[49] William W. Pugh,et al. Fine-grained analysis of array computations , 1998 .
[50] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[51] J.-F. Collard. Space-time transformation of while-loops using speculative execution , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[52] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[53] Vivek Sarkar. Loop Transformations for Hierarchical Parallelism and Locality , 1998, LCR.
[54] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[55] Ken Kennedy,et al. RETROSPECTIVE: Coloring Heuristics for Register Allocation , 2022 .
[56] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .