Combining Loop Transformations Considering Caches and Scheduling
暂无分享,去创建一个
[1] Yiping Guan. Unroll-And-Jam Guided by A Linear-Algebra-Based Data-Reuse Model , 1995 .
[2] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[3] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[4] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[5] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[6] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[7] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[8] L HennessyJohn,et al. Efficient and exact data dependence analysis , 1991 .
[9] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[10] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[11] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[12] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[13] Monica S. Lam,et al. Efficient and exact data dependence analysis , 1991, PLDI '91.
[14] Guang R. Gao,et al. Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.