Performance Portable Optimizations for Loops Containing Communication Operations
暂无分享,去创建一个
[1] Katherine Yelick,et al. Titanium: a high-performance Java dialect , 1998 .
[2] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[3] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[4] Katherine A. Yelick,et al. A performance analysis of the Berkeley UPC compiler , 2003, ICS '03.
[5] Lawrence Snyder,et al. Quantifying the effects of communication optimizations , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[6] Edith Schonberg,et al. A Unified Framework for Optimizing Communication in Data-Parallel Programs , 1996, IEEE Trans. Parallel Distributed Syst..
[7] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[8] J. Mellor-Crummey,et al. A multi-platform co-array Fortran compiler , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[9] John M. Mellor-Crummey,et al. Effective communication coalescing for data-parallel applications , 2005, PPOPP.
[10] Dhabaleswar K. Panda,et al. Zero-Copy MPI Derived Datatype Communication over InfiniBand , 2004, PVM/MPI.
[11] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[12] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[13] Edith Schonberg,et al. An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[14] V. Tipparaju,et al. Optimizing strided remote memory access operations on the Quadrics QsNetII network interconnect , 2005, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05).
[15] Mahmut T. Kandemir,et al. Minimizing Data and Synchronization Costs in One-Way Communication , 2000, IEEE Trans. Parallel Distributed Syst..
[16] Bryan Carpenter,et al. ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.
[17] Erich Strohmaier,et al. Optimizing communication overlap for high-speed networks , 2007, PPoPP.
[18] Prithviraj Banerjee,et al. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers , 1995, ICS '95.
[19] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[20] Laurie J. Hendren,et al. Communication optimizations for parallel C programs , 1998, J. Parallel Distributed Comput..
[21] Katherine A. Yelick,et al. Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[22] Jimmy Su,et al. Automatic support for irregular computations in a high-level language , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[23] Dan Bonachea. Proposal for extending the upc memory copy library functions and supporting extensions to gasnet , 2004 .
[24] Katherine Yelick,et al. A proposal for a UPC memory consistency model, v1.0 , 2004 .
[25] Mahmut T. Kandemir,et al. A global communication optimization technique based on data-flow analysis and linear algebra , 1999, TOPL.
[26] Yunheung Paek,et al. Compiling for Distributed Memory Multiprocessors Based on Access Region Analysis , 1997 .
[27] Chris J. Scheiman,et al. LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation , 1997, J. Parallel Distributed Comput..
[28] Jong-Deok Choi,et al. Global communication analysis and optimization , 1996, PLDI '96.
[29] Paul D. Gader,et al. Image algebra techniques for parallel image processing , 1987 .
[30] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[31] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[32] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[33] Yunheung Paek,et al. Efficient and precise array access analysis , 2002, TOPL.