Optimized Unrolling of Nested Loops
暂无分享,去创建一个
[1] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[2] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.
[3] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[4] Bruce R. Childers,et al. Memory bandwidth optimizations for wide-bus machines , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.
[5] Vicki H. Allan,et al. Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.
[6] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[7] Alexandru Nicolau,et al. Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.
[8] Steve Carr,et al. Unroll-and-jam using uniformly generated sets , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[9] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[10] Jian Wang,et al. GURPR—a method for global software pipelining , 1987, MICRO 20.
[11] Jack J. Dongarra,et al. Unrolling loops in fortran , 1979, Softw. Pract. Exp..
[12] James E. Smith,et al. A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.
[13] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[14] Vivek Sarkar,et al. Determining average program execution times and their variance , 1989, PLDI '89.
[15] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[16] Vivek Sarkar,et al. An optimal asynchronous scheduling algorithm for software cache consistency , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[17] Ken Kennedy,et al. Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..
[18] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[19] V. Sarkar,et al. Automatic partitioning of a program dependence graph into parallel tasks , 1991, IBM J. Res. Dev..
[20] Wen-mei W. Hwu,et al. Unrolling-based optimizations for modulo scheduling , 1995, MICRO 1995.
[21] Vivek Sarkar,et al. A general framework for iteration-reordering loop transformations , 1992, PLDI '92.
[22] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[23] Sanjay Jinturkar,et al. Aggressive Loop Unrolling in a Retargetable Optimizing Compiler , 1996, CC.