Design and Characterization of InfiniBand Hardware Tag Matching in MPI
暂无分享,去创建一个
Hari Subramoni | Mohammadreza Bayatpour | S. Mahdieh Ghazimirsaeed | Shulei Xu | Dhabaleswar K. Panda | D. Panda | H. Subramoni | Mohammadreza Bayatpour | S. M. Ghazimirsaeed | Shulei Xu
[1] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[2] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[3] Philip K. McKinley,et al. Efficient collective operations with ATM network interface support , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[4] Kees Verstoep,et al. Efficient reliable multicast on Myrinet , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[5] Jack Dongarra,et al. Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface , 1997 .
[6] Andreas Holzman. Recent Advances in Parallel Virtual Machine and Message Passing Interface , 2001, Lecture Notes in Computer Science.
[7] V. E. Henson,et al. BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .
[8] Leonid Oliker,et al. Message passing and shared address space parallelism on an SMP cluster , 2003, Parallel Comput..
[9] D.K. Panda,et al. Scalable NIC-based Reduction on Large-scale Clusters , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[10] Keith D. Underwood,et al. Evaluation of an Eager Protocol Optimization for MPI , 2003, PVM/MPI.
[11] Dhabaleswar K. Panda,et al. Design and implementation of MPICH2 over InfiniBand with RDMA support , 2003, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[12] Keith D. Underwood,et al. An analysis of NIC resource usage for offloading MPI , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[13] Sayantan Sur,et al. Shared receive queue based scalable MPI design for InfiniBand clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[14] Sayantan Sur,et al. RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.
[15] Sayantan Sur,et al. Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[16] Stephen W. Poole,et al. Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[17] Karl S. Hemmert,et al. Using Triggered Operations to Offload Rendezvous Messages , 2011, EuroMPI.
[18] Ahmad Afsahi,et al. An Efficient MPI Message Queue Mechanism for Large-scale Jobs , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[19] Keith D. Underwood,et al. Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.
[20] Dhabaleswar K. Panda,et al. Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters , 2015, ISC.
[21] Dhabaleswar K. Panda,et al. Adaptive and Dynamic Design for MPI Tag Matching , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[22] Dhabaleswar K. Panda,et al. Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication , 2017, ISC.
[23] S. M. Ghazimirsaeed,et al. Accelerating MPI Message Matching by a Data Clustering Strategy , 2017 .
[24] Dhabaleswar K. Panda,et al. Cooperative Rendezvous Protocols for Improved Performance and Overlap , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Ryan E. Grant,et al. A Dedicated Message Matching Mechanism for Collective Communications , 2018, ICPP Workshops.
[26] Michael J. Levenhagen,et al. The Case for Semi-Permanent Cache Occupancy: Understanding the Impact of Data Locality on Network Processing , 2018, ICPP.
[27] Ryan E. Grant,et al. Fuzzy Matching: Hardware Accelerated MPI Communication Middleware , 2019, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[28] Dhabaleswar K. Panda,et al. Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[29] Ryan E. Grant,et al. A dynamic, unified design for dedicated message matching engines for collective and point-to-point communications , 2019, Parallel Comput..
[30] Ahmad Afsahi,et al. Communication‐aware message matching in MPI , 2018, Concurr. Comput. Pract. Exp..