Generating efficient tiled code for distributed memory machines
暂无分享,去创建一个
[1] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[2] Jingling Xue. Communication-Minimal Tiling of Uniform Dependence Loops , 1997, J. Parallel Distributed Comput..
[3] Charles Koelbel. Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[4] Yves Robert,et al. On the removal of anti and output dependences , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.
[5] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[6] Larry Carter,et al. Efficient Parallelism via Hierarchical Tiling , 1995, PPSC.
[7] Weijia Shang,et al. Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.
[8] John R. Gilbert,et al. Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.
[9] Paul Feautrier,et al. Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..
[10] David K. Smith. Theory of Linear and Integer Programming , 1987 .
[11] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[12] Sanjay V. Rajopadhye,et al. Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..
[13] Yves Robert,et al. Tiling with limited resources , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.
[14] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[15] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.
[16] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[17] J. Ramanujam,et al. Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices , 1995, LCPC.
[18] Yves Robert,et al. Evaluating Array Expressions On Massively Parallel Machines With Communication/ Computation Overlap , 1995, Int. J. High Perform. Comput. Appl..
[19] S. Rajopadhye. Optimal Tiling of Two-Dimensional Uniform Recurrences , 1996 .
[20] Siegfried Benkner,et al. Vienna Fortran 90 , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..
[21] Larry Carter,et al. Quantifying the Multi-level Nature of Tiling Interactions , 1997, LCPC.
[22] Fabien Coelho,et al. State of the Art in Compiling HPF , 1996, The Data Parallel Programming Model.
[23] Monica S. Lam,et al. A data locality optimizing algorithm (with retrospective) , 1991 .
[24] Peiyi Tang,et al. Reducing data communication overhead for DOACROSS loop nests , 1994, ICS '94.
[25] Jingling Xue,et al. Reuse-Driven Tiling for Data Locality , 1997, LCPC.
[26] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[27] Jingling Xue,et al. On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..
[28] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .
[29] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.
[30] Peiyi Tang,et al. Implementing global address space in distributed local memories , 1994 .
[31] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[32] Sandeep K. S. Gupta,et al. On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[33] Paul Feautrier,et al. Optimizing Storage Size for Static Control Programs in Automatic Parallelizers , 1997, Euro-Par.
[34] J. Ramanujam,et al. Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..
[35] Frédéric Vivien,et al. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..
[36] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[37] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[38] Zhiyuan Li,et al. Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[39] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[40] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[41] Chien-Min Wang,et al. Tiling Nested Loops into Maximal Rectangular Blocks , 1996, J. Parallel Distributed Comput..
[42] Michael Gerndt,et al. Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..