A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems
暂无分享,去创建一个
Dhabaleswar K. Panda | Hari Subramoni | Karen Tomko | Dmitry Pekurovsky | Krishna Chaitanya Kandalla
[1] Yutaka Ishikawa,et al. Design of Kernel-Level Asynchronous Collective Communication , 2010, EuroMPI.
[2] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[3] Terry Jones,et al. Impacts of Operating Systems on the Scalability of Parallel Applications , 2003 .
[4] Sayantan Sur,et al. High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT , 2011, Computer Science - Research and Development.
[5] Forum Mpi. MPI: A Message-Passing Interface , 1994 .
[6] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] J. C. Vassilicos,et al. A numerical strategy to combine high-order schemes, complex geometry and parallel computing for high resolution DNS of fractal generated turbulence , 2010 .
[8] Manjunath Gorentla Venkata,et al. ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[9] Torsten Hoefler,et al. Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.
[10] Torsten Hoefler,et al. A Case for Non-blocking Collective Operations , 2006, ISPA Workshops.
[11] Corporate The MPI Forum,et al. MPI: a message passing interface , 1993, Supercomputing '93.
[12] Torsten Hoefler,et al. Optimization principles for collective neighborhood communications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Torsten Hoefler,et al. Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned , 2011, Euro-Par.
[14] Sayantan Sur,et al. Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows , 2011, EuroMPI.
[15] Darren J. Kerbyson,et al. Efficient offloading of collective communications in large-scale systems , 2007, 2007 IEEE International Conference on Cluster Computing.
[16] Dhabaleswar K. Panda,et al. Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[17] Torsten Hoefler,et al. Group Operation Assembly Language - A Flexible Way to Express Collective Communication , 2009, 2009 International Conference on Parallel Processing.
[18] Xin Yuan,et al. Efficient MPI Bcast across different process arrival patterns , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[19] Amith R. Mamidala,et al. Looking under the hood of the IBM Blue Gene/Q network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.