Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters
暂无分享,去创建一个
Torsten Hoefler | Timo Schneider | Brian W. Barrett | Ron Brightwell | Ryan E. Grant | T. Hoefler | R. Brightwell | Timo Schneider
[1] Sayantan Sur,et al. Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[2] Stephen W. Poole,et al. Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[3] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[4] Richard L. Graham,et al. Open MPI: A Flexible High Performance MPI , 2005, PPAM.
[5] Brian W. Barrett,et al. The Portals 4.3 Network Programming Interface , 2014 .
[6] BruckJehoshua,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997 .
[7] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[8] Torsten Hoefler,et al. Runtime detection and optimization of collective communication patterns , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] J. van Leeuwen,et al. Recent Advances in Parallel Virtual Machine and Message Passing Interface , 2002, Lecture Notes in Computer Science.
[10] Scott Pakin. Receiver-initiated message passing over RDMA Networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[11] Vinton G. Cerf,et al. A protocol for packet network intercommunication , 1974, CCRV.
[12] Duncan Roweth,et al. Optimised Global Reduction on QsNetII , 2005, Hot Interconnects.
[13] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[14] Karl S. Hemmert,et al. Using Triggered Operations to Offload Rendezvous Messages , 2011, EuroMPI.
[15] Karl S. Hemmert,et al. Using Triggered Operations to Offload Collective Communication Operations , 2010, EuroMPI.
[16] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[17] Manjunath Gorentla Venkata,et al. ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[18] Jesper Larsson Träff,et al. Optimal Broadcast for Fully Connected Networks , 2005, HPCC.
[19] D. Panda,et al. High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters , 2005, HiPC.
[20] Steve Poole,et al. ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[21] William Gropp,et al. MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.
[22] Dhabaleswar K. Panda,et al. Fast NIC-based barrier over Myrinet/GM , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[23] Torsten Hoefler,et al. Group Operation Assembly Language - A Flexible Way to Express Collective Communication , 2009, 2009 International Conference on Parallel Processing.
[24] Torsten Hoefler,et al. Optimization principles for collective neighborhood communications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Dhabaleswar K. Panda,et al. Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages , 2000, CANPC.
[26] Yutaka Ishikawa,et al. Design of Kernel-Level Asynchronous Collective Communication , 2010, EuroMPI.
[27] D. Roweth,et al. Optimised global reduction on QsNet/sup II/ , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).
[28] Torsten Hoefler,et al. Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.
[29] Torsten Hoefler,et al. Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned , 2011, Euro-Par.
[30] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[31] Karl S. Hemmert,et al. Enabling Flexible Collective Communication Offload with Triggered Operations , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.
[32] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.