An Integer Linear Programming Approach for Optirnizing Cache Locality
暂无分享,去创建一个
[1] Mahmut T. Kandemir,et al. A graph based framework to detect optimal memory layouts for improving data locality , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[2] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[3] Ricardo Bianchini,et al. Application Performance on the MIT Alewife Machine , 1996, Computer.
[4] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[5] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[6] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[7] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[8] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[9] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[10] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[11] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[12] Michael F. P. O'Boyle,et al. Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[13] Vivek Sarkar,et al. Locality Analysis for Distributed Shared-Memory Multiprocessors , 1996, LCPC.
[14] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[15] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[16] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[17] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[18] William Pugh,et al. The Omega Library interface guide , 1995 .
[19] Robert J. Harrison,et al. High-Performance Computational Chemistry: Hartree-Fock Electronic Structure Calculations on Massively Parallel Processors , 1999, Int. J. High Perform. Comput. Appl..
[20] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[21] Mahmut T. Kandemir,et al. A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.
[22] Jacqueline Chame,et al. The combined effectiveness of unimodular transformations, tiling, and software prefetching , 1996, Proceedings of International Conference on Parallel Processing.
[23] Ken Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, SC.
[24] GannonDennis,et al. Strategies for cache and local memory management by global program transformation , 1988 .
[25] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[26] E. Ayguade,et al. A Novel Approach Towards Automatic Data Distribution , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[27] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .