Static and Dynamic Locality Optimizations Using Integer Linear Programming
暂无分享,去创建一个
Mahmut T. Kandemir | Eduard Ayguadé | Alok N. Choudhary | J. Ramanujam | Prithviraj Banerjee | M. Kandemir | E. Ayguadé | A. Choudhary | P. Banerjee | J. Ramanujam
[1] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[2] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .
[3] E. Ayguade,et al. A Novel Approach Towards Automatic Data Distribution , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[4] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[5] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[6] K. Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[7] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[8] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[9] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[10] Milind Girkar,et al. Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..
[11] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[12] Mahmut T. Kandemir,et al. A graph based framework to detect optimal memory layouts for improving data locality , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[13] S. Turner,et al. Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[14] Ricardo Bianchini,et al. Application Performance on the MIT Alewife Machine , 1996, Computer.
[15] Jordi Torres,et al. Partitioning the statement per iteration space using non-singular matrices , 1993, ICS '93.
[16] William Pugh,et al. The Omega Library interface guide , 1995 .
[17] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[18] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[19] Nenad Nedeljkovic,et al. Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.
[20] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[21] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[22] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[23] K.M. Dixit. New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.
[24] Manish Gupta,et al. Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..
[25] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[26] Keshav Pingali,et al. Transformations for Imperfectly Nested Loops , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[27] Laurence A. Wolsey,et al. Integer and Combinatorial Optimization , 1988, Wiley interscience series in discrete mathematics and optimization.
[28] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[29] Mahmut T. Kandemir,et al. A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.
[30] Jacqueline Chame,et al. The combined effectiveness of unimodular transformations, tiling, and software prefetching , 1996, Proceedings of International Conference on Parallel Processing.
[31] Henry G. Dietz,et al. Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.
[32] Eduard Ayguade,et al. Dynamic data distribution with control flow analysis , 1996, Supercomputing '96.
[33] Prithviraj Banerjee,et al. Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers , 1995, LCPC.
[34] Susan J. Eggers,et al. Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.
[35] Marina C. Chen,et al. Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..
[36] Tarek S. Abdelrahman,et al. Automatic partitioning of data and computations on scalable shared memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[37] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[38] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[39] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[40] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[41] Mahmut T. Kandemir,et al. Locality Optimization Algorithms for Compilation of Out-of-Core Codes , 1998, J. Inf. Sci. Eng..
[42] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[43] 原田 秀逸. 私の computer 環境 , 1998 .
[44] Olivier Temam,et al. A quantitative analysis of loop nest locality , 1996, ASPLOS VII.
[45] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[46] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[47] Michael F. P. O'Boyle,et al. Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[48] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[49] Vivek Sarkar,et al. Locality Analysis for Distributed Shared-Memory Multiprocessors , 1996, LCPC.
[50] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[51] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[52] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[53] Susan J. Eggers,et al. Eliminating False Sharing , 1991, ICPP.
[54] William Jalby,et al. Impact of cache interferences on usual numerical dense loop nests , 1993 .
[55] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[56] Mahmut T. Kandemir,et al. An integer linear programming approach for optimizing cache locality , 1999, ICS '99.