A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations
暂无分享,去创建一个
Mahmut T. Kandemir | Alok N. Choudhary | J. Ramanujam | Meenakshi A. Kandaswamy | M. Kandemir | A. Choudhary | J. Ramanujam | M. Kandaswamy
[1] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[2] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[3] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[4] Rajesh R. Bordawekar,et al. Techniques for compiling i/o intensive parallel programs , 1996 .
[5] L. C. Smith. PASSION Runtime Library for Parallel I/O , 1994 .
[6] Ken Kennedy,et al. Automatic data layout for distributed-memory machines , 1998, TOPL.
[7] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[8] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[9] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[10] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[11] Alok N. Choudhary,et al. Automatic optimization of communication in compiling out-of-core stencil codes , 1996, ICS '96.
[12] Mahmut T. Kandemir,et al. A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations , 1997, IOPADS '97.
[13] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.
[14] Alok Choudhary,et al. PASSION Runtime Library for parallel I/O , 1994, Proceedings Scalable Parallel Libraries Conference.
[15] Mahmut T. Kandemir,et al. Improving the performance of out-of-core computations , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[16] Mahmut T. Kandemir,et al. Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines , 1997, Proceedings 11th International Parallel Processing Symposium.
[17] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[18] J. Ramanujam,et al. Integrating Data Distribution and Loop Transformations , 1995, PPSC.
[19] Rajeev Thakur,et al. Compilation of out-of-core data parallel programs for distributed memory machines , 1994, CARN.
[20] Mary E. Mace. Memory storage patterns in parallel processing , 1987, The Kluwer international series in engineering and computer science.
[21] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.
[22] Margaret Martonosi,et al. Evaluating the impact of advanced memory systems on compiler-parallelized codes , 1995, PACT.
[23] Thomas H. Cormen,et al. ViC*: A Preprocessor for Virtual-Memory C* , 1994 .
[24] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[25] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[26] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[27] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[28] Todd C. Mowry,et al. Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.
[29] Manish Gupta,et al. Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..
[30] Isidoro Couvertier-Reyes,et al. Automatic Data and Computation Mapping for Distributed-Memory Machines. , 1996 .
[31] Duncan H. Lawrie,et al. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.
[32] Amit Narayan,et al. Automatic Data Mapping and Program Transformations , 1995 .
[33] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[34] Kishor S. Trivedi. On the Paging Performance of Array Algorithms , 1977, IEEE Transactions on Computers.
[35] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[36] Ken Kennedy,et al. A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.
[37] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[38] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[39] Mahmut T. Kandemir,et al. Global I/O optimizations for out-of-core computations , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[40] Ken Kennedy,et al. Compiler support for out-of-core arrays on parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[41] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[42] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[43] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[44] A. C. McKellar,et al. The organization of matrices and matrix operations in a paged multiprogramming environment , 1968 .
[45] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.