MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi
暂无分享,去创建一个
[1] Ying Qian,et al. Design and Evaluation of Efficient Collective Communications on Modern Interconnects and Multi-core Clusters , 2010 .
[2] Dhabaleswar K. Panda,et al. Efficient collective operations using remote memory operations on VIA-based clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[3] Katherine Yelick,et al. Optimizing collective communication on multicores , 2009 .
[4] Tarek A. El-Ghazawi,et al. Benchmarking parallel compilers: A UPC case study , 2006, Future Gener. Comput. Syst..
[5] Galen M. Shipman,et al. X-SRQ- Improving Scalability and Performance of Multi-core InfiniBand Clusters , 2008, PVM/MPI.
[6] Alejandro Rico,et al. Tibidabo: Making the case for an ARM-based HPC system , 2014, Future Gener. Comput. Syst..
[7] Dhabaleswar K. Panda,et al. Scalable MPI design over InfiniBand using eXtended Reliable Connection , 2008, 2008 IEEE International Conference on Cluster Computing.
[8] Jiulong Shan,et al. Single Data Copying for MPI Communication Optimization on Shared Memory System , 2007, International Conference on Computational Science.
[9] Amith R. Mamidala,et al. Scaling alltoall collective on multi-core systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[10] Dhabaleswar K. Panda,et al. Efficient Intra-node Communication on Intel-MIC Clusters , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[11] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[12] Jorge González-Domínguez,et al. Scalable PGAS collective operations in NUMA clusters , 2014, Cluster Computing.
[13] Hyun-Wook Jin,et al. High performance MPI-2 one-sided communication over InfiniBand , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..
[14] Ashok Srinivasan,et al. Optimization of Collective Communication in Intra-cell MPI , 2007, HiPC.
[15] Dhabaleswar K. Panda,et al. Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.
[16] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[17] Torsten Hoefler,et al. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[18] Nagiza F. Samatova,et al. Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data , 2014, IEEE Transactions on Parallel and Distributed Systems.
[19] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[20] Jianlong Zhong,et al. Network Performance Aware MPI Collective Communication Operations in the Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.
[21] Katherine A. Yelick,et al. Tuning collective communication for Partitioned Global Address Space programming models , 2011, Parallel Comput..
[22] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[23] Fernando Obelleiro Basteiro,et al. High scalability multipole method. Solving half billion of unknowns , 2009, Computer Science - Research and Development.
[24] Dhabaleswar K. Panda,et al. MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[25] Dhabaleswar K. Panda,et al. Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[26] Abhinav Vishnu,et al. A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems , 2014, Future Gener. Comput. Syst..
[27] Dhabaleswar K. Panda,et al. Designing multi-leader-based Allgather algorithms for multi-core clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[28] Kevin T. Pedretti,et al. Optimizing Multi-core MPI Collectives with SMARTMAP , 2009, 2009 International Conference on Parallel Processing Workshops.
[29] Xiaofang Zhao,et al. Performance analysis and optimization of MPI collective operations on multi-core clusters , 2009, The Journal of Supercomputing.
[30] Raymond Namyst,et al. A multithreaded communication engine for multicore architectures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[31] Dhabaleswar K. Panda,et al. UPC on MIC: Early Experiences with Native and Symmetric Modes , 2013 .
[32] Robert A. van de Geijn,et al. Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.
[33] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[34] D. Panda,et al. High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters , 2005, HiPC.
[35] Galen M. Shipman,et al. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives , 2008, PVM/MPI.