Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast
暂无分享,去创建一个
Hari Subramoni | Amit Ruhela | Jahanzeb Hashmi | Bharath Ramesh | Sourav Chakraborty | Dhabaleswar Panda | D. Panda | S. Chakraborty | J. Hashmi | H. Subramoni | Amit Ruhela | B. Ramesh
[1] Ajay D. Kshemkalyani,et al. Dynamic multiroot, multiquery processing based on data sharing in sensor networks , 2010, TOSN.
[2] Xin Yuan,et al. Pipelined broadcast on Ethernet switched clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[3] Torsten Hoefler,et al. NUMA-aware shared-memory collective communication for MPI , 2013, HPDC.
[4] Dhabaleswar K. Panda,et al. Efficient design for MPI asynchronous progress without dedicated resources , 2019, Parallel Comput..
[5] Jérôme Vienne,et al. Benefits of Cross Memory Attach for MPI libraries on HPC Clusters , 2014, XSEDE '14.
[6] Dhabaleswar K. Panda,et al. Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] K. PandaDhabaleswar,et al. The MVAPICH Project: Evolution and Sustainability of an Open Source Production Quality MPI Library for HPC , 2013 .
[8] Dhabaleswar K. Panda,et al. Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[9] Emmanuel Jeannot,et al. A hierarchical model to manage hardware topology in MPI applications , 2017, EuroMPI/USA.
[10] Jeffrey S. Vetter,et al. Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.
[11] Xin Yuan,et al. Efficient MPI Bcast across different process arrival patterns , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[12] Thomas Hérault,et al. MPI Applications on Grids: A Topology Aware Approach , 2009, Euro-Par.
[13] Dhabaleswar K. Panda,et al. Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast , 2019, IEEE Transactions on Parallel and Distributed Systems.
[14] Keith D. Underwood,et al. Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.
[15] Manjunath Gorentla Venkata,et al. Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms , 2011 .
[16] Manjunath Gorentla Venkata,et al. Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems , 2011, 2011 IEEE International Conference on Cluster Computing.
[17] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[18] Torsten Hoefler,et al. Corrected trees for reliable group communication , 2019, PPoPP.
[19] Dhabaleswar K. Panda,et al. Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[20] Xiaohui Wei,et al. Latency-Balanced Optimization of MPI Collective Communication across Multi-clusters , 2013, 2013 8th ChinaGrid Annual Conference.
[21] Hao Zhu,et al. Hierarchical Collectives in MPICH2 , 2009, PVM/MPI.