Hierarchical tiling for improved superscalar performance
暂无分享,去创建一个
[1] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[2] Alan Jay Smith,et al. Machine Characterization Based on an Abstract High-Level Language Machine , 1989, IEEE Trans. Computers.
[3] Daniel A. Reed,et al. Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems , 1987, IEEE Transactions on Computers.
[4] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[5] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[6] Larry Carter,et al. Efficient Parallelism via Hierarchical Tiling , 1995, PPSC.
[7] William Jalby,et al. Optimizing matrix operations on a parallel multiprocessor with a memory hierarchy , 1986 .
[8] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[9] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[10] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[11] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[12] Santosh G. Abraham,et al. Compiler techniques for data partitioning of sequentially iterated parallel loops , 1990, ICS '90.
[13] Fung F. Lee. Partitioning of Regular Computation on Multiprocessor Systems , 1990, J. Parallel Distributed Comput..
[14] James R. Larus,et al. CICO: A Practical Shared-Memory Programming Performance Model , 1994 .
[15] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[16] Bowen Alpern,et al. Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.
[17] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[18] Santosh G. Abraham,et al. Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic , 1991, IEEE Trans. Parallel Distributed Syst..
[19] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.