A Matrix-Based Approach to Global Locality Optimization
暂无分享,去创建一个
Mahmut T. Kandemir | Alok N. Choudhary | J. Ramanujam | Prithviraj Banerjee | M. Kandemir | A. Choudhary | Prith Banerjeez | P. Banerjee | Alok Choudharyz | J. Ramanujamx
[1] Mahmut T. Kandemir,et al. Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines , 2000, J. Parallel Distributed Comput..
[2] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[3] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[4] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[5] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[6] Vadim Maslov,et al. Delinearization: an efficient way to break multiloop dependence equations , 1992, PLDI '92.
[7] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[8] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[9] Constantine D. Polychronopoulos,et al. Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.
[10] P. Sadayappan,et al. Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..
[11] Mahmut T. Kandemir,et al. A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[12] William Pugh,et al. The Omega Library interface guide , 1995 .
[13] Mahmut Kandemir,et al. An Iteration Space Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality , 1999 .
[14] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[15] Mahmut T. Kandemir,et al. A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.
[16] Henry G. Dietz,et al. Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.
[17] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[18] Ken Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, SC.
[19] Marina C. Chen,et al. Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..
[20] Tarek S. Abdelrahman,et al. Automatic partitioning of data and computations on scalable shared memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[21] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[22] Alexander Schrijver,et al. Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.
[23] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.
[24] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[25] Rudolf Eigenmann,et al. An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[26] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[27] Nenad Nedeljkovic,et al. Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.
[28] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[29] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[30] Chau-Wen Tseng,et al. Unified compilation techniques for shared and distributed address space machines , 1995, ICS '95.
[31] Mahmut T. Kandemir,et al. A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality , 1998, LCPC.
[32] J. Ramanujam,et al. Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..
[33] Yunheung Paek,et al. Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .
[34] John R. Gilbert,et al. Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.
[35] Jang-Ping Sheu,et al. Communication-Free Partitioning of Nested Loops , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.
[36] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[37] Michael F. P. O'Boyle,et al. Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[38] A. Ibrahim. Linear and Integer Linear Programming. , 1975 .
[39] Bernard Kolman,et al. Introductory Linear Algebra with Applications , 1976 .
[40] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[41] Monica S. Lam,et al. Automatic computation and data decomposition for multiprocessors , 1997 .
[42] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[43] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[44] Wei Li,et al. Recovering Logical Data and Code Structures , 1995 .
[45] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[46] E. Ayguade,et al. A Novel Approach Towards Automatic Data Distribution , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[47] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[48] Alexandru Nicolau,et al. Advances in languages and compilers for parallel processing , 1991 .
[49] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[50] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[51] Steven J. Leon. Linear algebra with applications / Steven J. Leon , 1986 .
[52] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[53] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[54] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[55] Keshav Pingali,et al. Transformations for Imperfectly Nested Loops , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[56] A. C. McKellar,et al. The organization of matrices and matrix operations in a paged multiprogramming environment , 1968 .
[57] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[58] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[59] Vivek Sarkar,et al. Locality Analysis for Distributed Shared-Memory Multiprocessors , 1996, LCPC.
[60] Margaret Martonosi,et al. Evaluating the impact of advanced memory systems on compiler-parallelized codes , 1995, PACT.
[61] Yves Robert,et al. How to optimize residual communications? , 1996, Proceedings of International Conference on Parallel Processing.