Quantifying the Multi-level Nature of Tiling Interactions
暂无分享,去创建一个
[1] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[2] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[3] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992 .
[4] Bowen Alpern,et al. Hierarchical Tiling: A Methodology for High Performance , 1996 .
[5] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[6] Larry Carter,et al. Determining the idle time of a tiling , 1997, POPL '97.
[7] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[8] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[9] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[10] TimePaul FeautrierLaboratoire Masi. Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1993 .
[11] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[12] Jacqueline Chame,et al. The combined effectiveness of unimodular transformations, tiling, and software prefetching , 1996, Proceedings of International Conference on Parallel Processing.
[13] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[14] William Pugh,et al. A unifying framework for iteration reordering transformations , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.
[15] Anant Agarwal,et al. Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[16] Vivek Sarkar,et al. A general framework for iteration-reordering loop transformations , 1992, PLDI '92.
[17] P. Feautrier. Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1996 .
[18] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[19] A PaduaDavid,et al. Advanced compiler optimizations for supercomputers , 1986 .
[20] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[21] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[22] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[23] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[24] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[25] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[26] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[27] Dennis Gannon,et al. Applying AI Techniques to Program Optimization for Parallel Computers , 1987 .
[28] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[29] Vivek Sarkar,et al. Locality Analysis for Distributed Shared-Memory Multiprocessors , 1996, LCPC.
[30] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[31] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[32] Larry Carter,et al. Efficient Parallelism via Hierarchical Tiling , 1995, PPSC.
[33] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.