Loop Distribution and Fusion with Timing and Code Size Optimization
暂无分享,去创建一个
Yi He | Meikang Qiu | Edwin Hsing-Mean Sha | Qingfeng Zhuge | Meilin Liu | Meilin Liu | Meikang Qiu | E. Sha | Q. Zhuge | Yi He
[1] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[2] Gilles Villard,et al. Lattice-Based Memory Allocation , 2005, IEEE Trans. Computers.
[3] Francky Catthoor,et al. Custom Memory Management Methodology , 1998, Springer US.
[4] Martin Palkovic,et al. Memory requirement optimization with loop fusion and loop shifting , 2004 .
[5] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[6] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[7] Gerda Janssens,et al. Multi-dimensional incremental loop fusion for data locality , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.
[8] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992 .
[9] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[10] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[11] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[12] Keith D. Cooper,et al. Engineering a Compiler , 2003 .
[13] Edwin Hsing-Mean Sha,et al. General loop fusion technique for nested loops considering timing and code size , 2004, CASES '04.
[14] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[15] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[16] Francky Catthoor,et al. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .
[17] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[18] Ken Kennedy,et al. Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.
[19] Edwin Hsing-Mean Sha,et al. Optimizing Overall Loop Schedules Using Prefetching and Partitioning , 2000, IEEE Trans. Parallel Distributed Syst..
[20] Edwin Hsing-Mean Sha,et al. Register aware scheduling for distributed cache clustered architecture , 2003, ASP-DAC '03.
[21] Ken Kennedy,et al. Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion , 2004, Int. J. High Perform. Comput. Appl..
[22] Erik Brockmeyer,et al. Data and memory optimization techniques for embedded systems , 2001, TODE.
[23] Edwin Hsing-Mean Sha,et al. Polynomial-time nested loop fusion with full parallelism , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[24] Alain Darte,et al. On the complexity of loop fusion , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).