Accelerating Allreduce Operation: A Switch-Based Solution
暂无分享,去创建一个
Dawei Wang | Zheng Cao | Ninghui Sun | Xuejun An | Nongda Hu | Ninghui Sun | Xuejun An | Zheng Cao | Nongda Hu | Dawei Wang
[1] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.
[2] Dhabaleswar K. Panda,et al. Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[3] Dhabaleswar K. Panda,et al. Efficient collective operations using remote memory operations on VIA-based clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[4] William Jalby,et al. Improving MPI communication overlap with collaborative polling , 2012, Computing.
[5] Sayantan Sur,et al. Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[6] Dhabaleswar K. Panda,et al. Scalable NIC-based Reduction on Large-scale Clusters , 2003, International Conference on Software Composition.
[7] Rolf Rabenseifner,et al. Optimization of Collective Reduction Operations , 2004, International Conference on Computational Science.
[8] Ninghui Sun. HPP: an architecture for high performance and utility computing , 2007, China HPC.
[9] Peter Schelkens,et al. An Investigation into the Performance of Reduction Algorithms under Load Imbalance , 2012, Euro-Par.
[10] Xin Yuan,et al. Bandwidth Efficient All-reduce Operation on Tree Topologies , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[11] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[12] Jesper Larsson Träff,et al. More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.
[13] Cao Zheng. Design of Barrier Network of Dawning 5000 High Performance Computer , 2008 .
[14] Philip Heidelberger,et al. Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.
[15] D. Panda,et al. Efficient Barrier and Allreduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms , 2004 .
[16] Amith R. Mamidala,et al. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[17] Gábor Dózsa,et al. Efficient Implementation of Allreduce on BlueGene/L Collective Network , 2005, PVM/MPI.
[18] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[19] Kenichi Miura,et al. The design of ultra scalable MPI collective communication on the K computer , 2012, Computer Science - Research and Development.
[20] R. Rabenseifner,et al. Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .
[21] Keith D. Underwood,et al. Implications of application usage characteristics for collective communication offload , 2006, Int. J. High Perform. Comput. Netw..
[22] Karl S. Hemmert,et al. Enabling Flexible Collective Communication Offload with Triggered Operations , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.
[23] Kai Li,et al. HPP: An Architecture for High Performance and Utility Computing: HPP: An Architecture for High Performance and Utility Computing , 2009 .
[24] D. Panda,et al. NIC-Based Reduction in Myrinet Clusters: Is It Beneficial? , 2003 .
[25] Jack J. Dongarra,et al. Decision Trees and MPI Collective Algorithm Selection Problem , 2007, Euro-Par.
[26] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[27] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..