Optimizing Process-to-Core Mappings for Application Level Multi-dimensional MPI Communications
暂无分享,去创建一个
[1] Robert A. van de Geijn,et al. On the Efficiency of Global Combine Algorithms for 2-D Meshes With WormholeRouting , 1993 .
[2] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[3] Robert A. van de Geijn,et al. Global Combine Algorithms for 2-D Meshes with Wormhole Routing , 1995, J. Parallel Distributed Comput..
[4] Georg Hager,et al. Communication Characteristics and Hybrid MPI/OpenMP Parallel Pr ogramming on Clusters of Multi-core SMP Nodes , 2009 .
[5] Laxmikant V. Kale,et al. Automating Topology Aware Mapping for Supercomputers , 2010 .
[6] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[7] Hui Liu,et al. Optimizing Process-to-Core Mappings for Two Dimensional Broadcast/Reduce on Multicore Architectures , 2011, 2011 International Conference on Parallel Processing.
[8] Xin Yuan,et al. Bandwidth Efficient All-to-All Broadcast on Switched Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.
[9] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[10] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[11] Dhabaleswar K. Panda,et al. Designing multi-leader-based Allgather algorithms for multi-core clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[12] Robert A. van de Geijn,et al. Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.
[13] Galen M. Shipman,et al. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives , 2008, PVM/MPI.
[14] Xin Yuan,et al. An MPI tool for automatically discovering the switch level topologies of Ethernet clusters , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[15] Rajeev Rastogi,et al. Topology discovery in heterogeneous IP networks: the NetInventory system , 2004, IEEE/ACM Transactions on Networking.
[16] S. Lennart Johnsson,et al. Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes , 1986, ICPP.
[17] Dhabaleswar K. Panda,et al. Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[18] Zizhong Chen,et al. Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing , 2009, IEEE Transactions on Computers.
[19] Dhabaleswar K. Panda,et al. Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).
[20] Jesper Larsson Träff,et al. Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees , 2007, PVM/MPI.
[21] Jarek Nieplocha,et al. Topology-aware tile mapping for clusters of SMPs , 2006, CF '06.
[22] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[23] George Bosilca,et al. Locality and Topology Aware Intra-node Communication among Multicore CPUs , 2010, EuroMPI.
[24] Thomas Rauber,et al. Optimizing MPI collective communication by orthogonal structures , 2006, Cluster Computing.