Optimizing irregular shared-memory applications for distributed-memory systems
暂无分享,去创建一个
[1] Openmp: a Proposed Industry Standard Api for Shared Memory Programming , 2022 .
[2] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[3] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[4] Frederica Darema,et al. A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..
[5] Katherine Yelick,et al. Titanium Language Reference Manual , 2001 .
[6] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[7] Rudolf Eigenmann,et al. Towards automatic translation of OpenMP to MPI , 2005, ICS '05.
[8] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[9] Chau-Wen Tseng,et al. A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.
[10] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[11] Harry Berryman,et al. Distributed Memory Compiler Design for Sparse Problems , 1995, IEEE Trans. Computers.
[12] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .
[13] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[14] Hironori Kasahara,et al. Data-localization for Fortran macro-dataflow computation using partial static task assignment , 1996, ICS '96.
[15] M. Karplus,et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .
[16] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[17] Jimmy Su,et al. Automatic support for irregular computations in a high-level language , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[18] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .
[19] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[20] Gagan Agrawal,et al. Porting and performance evaluation of irregular codes using OpenMP , 2000, Concurr. Pract. Exp..
[21] Rudolf Eigenmann,et al. Optimizing OpenMP Programs on Software Distributed Shared Memory Systems , 2004, International Journal of Parallel Programming.
[22] José E. Moreira,et al. An Overview Of The Bluegene/L System Software Organization , 2003, Parallel Process. Lett..
[23] Ken Kennedy,et al. Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.
[24] Greg Burns,et al. LAM: An Open Cluster Environment for MPI , 2002 .
[25] Alan L. Cox,et al. Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.
[26] G. Liu,et al. Overlap of Computation and Communication on Shared-Memory , 1999, Scalable Comput. Pract. Exp..
[27] Dimitri J. Mavriplis,et al. The design and implementation of a parallel unstructured Euler solver using software primitives , 1992 .
[28] Ken Kennedy,et al. An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..
[29] David A. Padua,et al. Array privatization for shared and distributed memory machines (extended abstract) , 1993, SIGP.
[30] Edith Schonberg,et al. An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[31] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[32] Toshio Nakatani,et al. A Loop Transformation Algorithm for Communication Overlapping , 2004, International Journal of Parallel Programming.
[33] Horst D. Simon,et al. Partitioning of unstructured problems for parallel processing , 1991 .
[34] Matt W. Mutka,et al. Enabling unimodular transformations , 1994, Proceedings of Supercomputing '94.
[35] SkjellumAnthony,et al. A high-performance, portable implementation of the MPI message passing interface standard , 1996 .
[36] Prithviraj Banerjee,et al. Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.