On the Parallel Execution Time of Tiled Loops
暂无分享,去创建一个
[1] John Paul Shen,et al. Automatic partitioning of signal processing programs for symmetric multiprocessors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[2] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[3] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[4] Jingling Xue,et al. Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.
[5] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[6] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[7] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[8] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling , 1998, Euro-Par.
[9] Larry Carter,et al. Determining the idle time of a tiling , 1997, POPL '97.
[10] Daniel A. Reed,et al. Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systems , 1987, IEEE Transactions on Computers.
[11] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[12] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[13] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[14] Larry Carter,et al. Predicting performance for tiled perfectly nested loops , 1999 .
[15] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[16] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[17] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[18] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[19] Yves Robert,et al. (Pen)-ultimate tiling? , 1994, Integr..
[20] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[21] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[22] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[23] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[24] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992, ICS '92.
[25] Ken Kennedy,et al. Software for supercomputers of the future , 1992, The Journal of Supercomputing.
[26] Larry Carter,et al. Efficient Parallelism via Hierarchical Tiling , 1995, PPSC.
[27] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[28] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[29] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[30] Yves Robert,et al. Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[31] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[32] Yves Robert,et al. Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..
[33] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[34] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[35] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[36] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[37] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[38] Sanjay V. Rajopadhye,et al. Optimizing memory usage in the polyhedral model , 2000, TOPL.
[39] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[40] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.