Improved MPI collectives for MPI processes in shared address spaces
暂无分享,去创建一个
Torsten Hoefler | Shigang Li | Marc Snir | Chungjin Hu | M. Snir | T. Hoefler | Shigang Li | Chungjin Hu
[1] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[2] Dhabaleswar K. Panda,et al. Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[3] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[4] Torsten Hoefler,et al. NUMA-aware shared-memory collective communication for MPI , 2013, HPDC.
[5] Nian-Feng Tzeng,et al. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.
[6] Hao Zhu,et al. Hierarchical Collectives in MPICH2 , 2009, PVM/MPI.
[7] Karl Feind,et al. An Ultrahigh Performance MPI Implementation on SGI® ccNUMA Altix® Systems , 2006 .
[8] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[9] Alan Wagner,et al. FG-MPI: Fine-grain MPI for multicore and clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[10] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[11] Rolf Rabenseifner,et al. Optimization of Collective Reduction Operations , 2004, International Conference on Computational Science.
[12] Torsten Hoefler,et al. Fast barrier synchronization for InfiniBand/spl trade/ , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[13] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[14] Michael L. Scott,et al. Synchronization without contention , 1991, ASPLOS IV.
[15] Galen M. Shipman,et al. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives , 2008, PVM/MPI.
[16] Patrick Carribault,et al. MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption , 2009, PVM/MPI.
[17] Tao Yang,et al. Program transformation and runtime support for threaded MPI execution on shared-memory machines , 2000, TOPL.
[18] Torsten Hoefler,et al. Hybrid MPI: Efficient message passing for multi-core systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Marc Snir,et al. Optimizing the Barnes-Hut algorithm in UPC , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[20] Laxmikant V. Kalé,et al. Automatic MPI to AMPI Program Transformation Using Photran , 2010, Euro-Par Workshops.
[21] Torsten Hoefler,et al. Fast barrier synchronization for InfiniBand , 2006 .
[22] Debra Hensgen,et al. Two algorithms for barrier synchronization , 1988, International Journal of Parallel Programming.
[23] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[24] Steve Sistare,et al. Optimization of MPI Collectives on Clusters of Large-Scale SMP's , 1999, SC.
[25] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[26] Torsten Hoefler,et al. Ownership passing: efficient distributed memory programming on multi-core systems , 2013, PPoPP '13.
[27] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[28] Katherine A. Yelick,et al. Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.
[29] Mark A. Taylor,et al. Architecture of LA-MPI, a network-fault-tolerant MPI , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[30] Tao Yang,et al. Optimizing threaded MPI execution on SMP clusters , 2001, ICS '01.