A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations
暂无分享,去创建一个
[1] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[2] Ken Kennedy,et al. Compiler support for out-of-core arrays on parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[3] Kishor S. Trivedi. On the Paging Performance of Array Algorithms , 1977, IEEE Transactions on Computers.
[4] Ken Kennedy,et al. A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.
[5] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[6] KremerUlrich,et al. Automatic data layout for distributed-memory machines , 1998 .
[7] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.
[8] Thomas H. Cormen,et al. ViC*: A Preprocessor for Virtual-Memory C* , 1994 .
[9] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[10] J. Ramanujam,et al. Non-unimodular transformations of nested loops , 1992, Proceedings Supercomputing '92.
[11] J. Ramanujam,et al. Integrating Data Distribution and Loop Transformations , 1995, PPSC.
[12] Carla Schlatter Ellis,et al. Characterizing parallel file-access patterns on a large-scale multiprocessor , 1995, IPPS.
[13] A. C. McKellar,et al. The organization of matrices and matrix operations in a paged multiprogramming environment , 1968 .
[14] Margaret Martonosi,et al. Evaluating the impact of advanced memory systems on compiler-parallelized codes , 1995, PACT.
[15] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[16] Mahmut T. Kandemir,et al. Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines , 1997, Proceedings 11th International Parallel Processing Symposium.
[17] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[18] Ken Kennedy,et al. Automatic data layout for distributed-memory machines , 1998, TOPL.
[19] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[20] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[21] Todd C. Mowry,et al. Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.
[22] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[23] Amit Narayan,et al. Automatic Data Mapping and Program Transformations , 1995 .
[24] Alok N. Choudhary,et al. Automatic optimization of communication in compiling out-of-core stencil codes , 1996, ICS '96.
[25] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[26] Rajesh R. Bordawekar,et al. Techniques for compiling i/o intensive parallel programs , 1996 .
[27] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[28] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[29] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[30] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.