Optimization principles for collective neighborhood communications
暂无分享,去创建一个
[1] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.
[2] George Bosilca,et al. High Performance RDMA Protocols in HPC , 2006, PVM/MPI.
[3] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[4] Michael M. Resch,et al. Towards performance portability through runtime adaptation for high-performance computing applications , 2010, ISC 2010.
[5] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[6] Torsten Hoefler,et al. Sparse collective operations for MPI , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Jack J. Dongarra,et al. MPI Collective Algorithm Selection and Quadtree Encoding , 2006, PVM/MPI.
[8] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[9] Vipin Kumar,et al. Parallel static and dynamic multi‐constraint graph partitioning , 2002, Concurr. Comput. Pract. Exp..
[10] Torsten Hoefler,et al. Sparse Non-blocking Collectives in Quantum Mechanical Calculations , 2008, PVM/MPI.
[11] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[12] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[13] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[14] Xin Yuan,et al. STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.
[15] A. Krasnitz,et al. Studying Quarks and Gluons On Mimd Parallel Computers , 1991, Int. J. High Perform. Comput. Appl..
[16] Leonid Oliker,et al. Communication Requirements and Interconnect Optimization for High-End Scientific Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.
[17] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[18] Hubert Ritzdorf,et al. The scalable process topology interface of MPI 2.2 , 2011, Concurr. Comput. Pract. Exp..
[19] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[20] Torsten Hoefler,et al. Group Operation Assembly Language - A Flexible Way to Express Collective Communication , 2009, 2009 International Conference on Parallel Processing.
[21] Jeffrey S. Vetter,et al. Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[22] Alok N. Choudhary,et al. Automatic optimization of communication in compiling out-of-core stencil codes , 1996, ICS '96.
[23] GUNDOLF HAASE,et al. Parallel Algebraic Multigrid Methods on Distributed Memory Computers , 2002, SIAM J. Sci. Comput..
[24] Amotz Bar-Noy,et al. Designing broadcasting algorithms in the postal model for message-passing systems , 2005, Mathematical systems theory.
[25] Wei Shyy,et al. Lattice Boltzmann Method for 3-D Flows with Curved Boundary , 2000 .
[26] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[27] Sergei Gorlatch,et al. Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.
[28] Philip Heidelberger,et al. Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[29] D. J. A. Welsh,et al. An upper bound for the chromatic number of a graph and its application to timetabling problems , 1967, Comput. J..
[30] Jesper Larsson Träff,et al. Two-tree algorithms for full bandwidth broadcast, reduction and scan , 2009, Parallel Comput..
[31] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[32] Larry Kaplan,et al. The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[33] B. Bollobás. The evolution of random graphs , 1984 .
[34] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[35] P. Erdos,et al. On the evolution of random graphs , 1984 .
[36] George L.-T. Chiu,et al. Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..
[37] Michael Woodacre. The SGI® Altix 3000 Global Shared-Memory Architecture , 2003 .
[38] William C. Skamarock,et al. A time-split nonhydrostatic atmospheric model for weather research and forecasting applications , 2008, J. Comput. Phys..
[39] Paul D. Gader,et al. Image algebra techniques for parallel image processing , 1987 .
[40] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .