Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems
暂无分享,去创建一个
[1] Keith D. Underwood,et al. Implications of application usage characteristics for collective communication offload , 2006, Int. J. High Perform. Comput. Netw..
[2] Samuel P. Midkiff,et al. Efficient high performance collective communication for the cell blade , 2009, ICS '09.
[3] Pavel Shamis,et al. Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-Direct Capabilities , 2010, EuroMPI.
[4] Robert A. van de Geijn,et al. On optimizing collective communication , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[5] Katherine Yelick,et al. Optimizing collective communication on multicores , 2009 .
[6] Manjunath Gorentla Venkata,et al. Cheetah: A Framework for Scalable Hierarchical Collective Operations , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[7] Thomas Hérault,et al. MPI Applications on Grids: A Topology Aware Approach , 2009, Euro-Par.
[8] Robert A. van de Geijn,et al. Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.
[9] Rolf Rabenseifner,et al. Optimization of Collective Reduction Operations , 2004, International Conference on Computational Science.
[10] Xiaofang Zhao,et al. Multi-core aware optimization for MPI collectives , 2008, 2008 IEEE International Conference on Cluster Computing.
[11] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[12] Robert A. van de Geijn,et al. On Global Combine Operations , 1994, J. Parallel Distributed Comput..
[13] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[14] Philip Heidelberger,et al. Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.
[15] Manjunath Gorentla Venkata,et al. ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[16] Dhabaleswar K. Panda,et al. Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[17] Ronald Mraz,et al. Reducing the variance of point to point transfers in the IBM 9076 parallel computer , 1994, Proceedings of Supercomputing '94.
[18] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[19] Hubert Ritzdorf,et al. Collective operations in NEC's high-performance MPI libraries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[20] Philip Heidelberger,et al. The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.