Design and Analysis of Pipelined Broadcast Algorithms for the All-Port Interlaced Bypass Torus Networks
暂无分享,去创建一个
[1] Dennis Gannon,et al. On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.
[2] Amith R. Mamidala,et al. MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[3] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[4] G. Johnson,et al. A Performance Comparison Through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[5] Erich Strohmaier,et al. High-performance computing: clusters, constellations, MPPs, and future directions , 2003, Comput. Sci. Eng..
[6] Rolf Rabenseifner,et al. Automatic Profiling of MPI Applications with Hardware Performance Counters , 1999, PVM/MPI.
[7] Sudhakar Yalamanchili,et al. Interconnection Networks: An Engineering Approach , 2002 .
[8] José Duato,et al. Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[9] Viktor K. Prasanna,et al. Portable and Scalable Algorithm for Irregular All-to-All Communication , 2002, J. Parallel Distributed Comput..
[10] Robert A. van de Geijn,et al. Broadcasting on Meshes with Wormhole Routing , 1996, J. Parallel Distributed Comput..
[11] Rajeev Thakur,et al. All-to-all communication on meshes with wormhole routing , 1994, Proceedings of 8th International Parallel Processing Symposium.
[12] Ben H. H. Juurlink,et al. Gossiping on Meshes and Tori , 1998, IEEE Trans. Parallel Distributed Syst..
[13] Robert S. Germain,et al. Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer , 2005, Euro-Par.
[14] Robert A. van de Geijn,et al. Fast Collective Communication Libraries, Please , 1995 .
[15] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[16] Sathish S. Vadhiyar,et al. ACCT: Automatic Collective Communications Tuning , 2000, PVM/MPI.
[17] Robert A. van de Geijn,et al. On optimizing collective communication , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[18] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[19] Robert A. van de Geijn,et al. Global Combine Algorithms for 2-D Meshes with Wormhole Routing , 1995, J. Parallel Distributed Comput..
[20] Cruz Izu,et al. The Adaptive Bubble Router , 2001, J. Parallel Distributed Comput..
[21] William Gropp,et al. Design and implementation of message-passing services for the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..
[22] Stéphane Pérennes,et al. All-to-all broadcast in torus with wormhole-like routing , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[23] Philip Heidelberger,et al. Optimization of All-to-All Communication on the Blue Gene/L Supercomputer , 2008, 2008 37th International Conference on Parallel Processing.
[24] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[25] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[26] Xin Yuan,et al. STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.
[27] Ulrich Meyer,et al. Time-independent gossiping on full-port tori , 1998 .
[28] Yuanyuan Yang,et al. Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[29] Lars Paul Huse. Collective Communication on Dedicated Clusters of Workstations , 1999, PVM/MPI.
[30] Steven L. Scott,et al. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .
[31] Philip Heidelberger,et al. Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..
[32] Robert A. van de Geijn,et al. Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.
[33] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[34] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[35] Peng Zhang,et al. Interlacing Bypass Rings to Torus Networks for More Efficient Networks , 2011, IEEE Transactions on Parallel and Distributed Systems.
[36] Yuanyuan Yang,et al. Pipelined All-to-All Broadcast in All-Port Meshes and Tori , 2001, IEEE Trans. Computers.
[37] Henri E. Bal,et al. Bandwidth-efficient collective communication for clustered wide area systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[38] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.
[39] Philip Heidelberger,et al. Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.
[40] Jack J. Dongarra,et al. Performance Analysis of MPI Collective Operations , 2005, IPDPS.
[41] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..