Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct
暂无分享,去创建一个
Manjunath Gorentla Venkata | Pavel Shamis | Richard L. Graham | Joshua Ladd | R. Graham | Pavel Shamis | Joshua Ladd
[1] Manjunath Gorentla Venkata,et al. Cheetah: A Framework for Scalable Hierarchical Collective Operations , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[2] Sayantan Sur,et al. High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT , 2011, Computer Science - Research and Development.
[3] Steve Poole,et al. ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[4] Scott Pakin,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.
[5] Manjunath Gorentla Venkata,et al. ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[6] Ronald Mraz,et al. Reducing the variance of point to point transfers in the IBM 9076 parallel computer , 1994, Proceedings of Supercomputing '94.
[7] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[8] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[9] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[10] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[11] Torsten Hoefler,et al. Optimizing non-blocking collective operations for infiniband , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[12] Ying Qian,et al. Design and Evaluation of Efficient Collective Communications on Modern Interconnects and Multi-core Clusters , 2010 .
[13] Christopher Wilson,et al. COMB: a portable benchmark suite for assessing MPI overlap , 2002, Proceedings. IEEE International Conference on Cluster Computing.
[14] D. Panda,et al. High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters , 2005, HiPC.
[15] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..