Scalable collective message-passing algorithms
暂无分享,去创建一个
[1] Laxmikant V. Kale,et al. Automating Topology Aware Mapping for Supercomputers , 2010 .
[2] Felix Wolf,et al. Parallel Sorting with Minimal Data , 2011, EuroMPI.
[3] Eitan Zahavi,et al. Fat-Trees Routing and Node Ordering Providing Contention Free Traffic for MPI Global Collectives , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[4] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[5] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[6] Yogish Sabharwal,et al. Optimal bucket algorithms for large MPI collectives on torus interconnects , 2010, ICS '10.
[7] Jonathan Schaeffer,et al. Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..
[8] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[9] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[10] Torsten Hoefler,et al. Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.
[11] Jesper Larsson Träff,et al. A Pipelined Algorithm for Large, Irregular All-Gather Problems , 2010, Int. J. High Perform. Comput. Appl..
[12] Friedhelm Meyer auf der Heide,et al. Optimal broadcast on parallel locality models , 2003, J. Discrete Algorithms.
[13] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[14] Herb Sutter,et al. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .
[15] Tao Yang,et al. Optimizing threaded MPI execution on SMP clusters , 2001, ICS '01.
[16] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.
[17] Torsten Hoefler,et al. The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[18] Rajendra Akerkar,et al. Reconfigurable Architectures and Algorithms: A Research Survey , 2009, Int. J. Comput. Sci. Appl..
[19] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[20] S. W. Song,et al. A Note on Parallel Selection on Coarse-Grained Multicomputers , 1999, Algorithmica.
[21] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Kamil Iskra,et al. Characterizing the Performance of “Big Memory” on Blue Gene Linux , 2009, 2009 International Conference on Parallel Processing Workshops.
[23] Philip Heidelberger,et al. Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.
[24] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[25] Jehoshua Bruck,et al. CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..
[26] Viral B. Shah,et al. A Novel Parallel Sorting Algorithm for Contemporary Architectures , 2007 .
[27] J. Watts,et al. Interprocessor collective communication library (InterCom) , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[28] Robert A. van de Geijn,et al. Global combine on mesh architectures with wormhole routing , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[29] William Gropp,et al. User's Guide for mpich, a Portable Implementation of MPI Version 1.2.2 , 1996 .
[30] Torsten Hoefler,et al. Sparse collective operations for MPI , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[31] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[32] Xin Yuan,et al. Bandwidth Efficient All-reduce Operation on Tree Topologies , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[33] Torsten Hoefler,et al. Adaptive Routing Strategies for Modern High Performance Networks , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.
[34] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, Hot Interconnects.
[35] Laxmikant V. Kalé,et al. Highly scalable parallel sorting , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[36] Richard Stong. Hamilton decompositions of cartesian products of graphs , 1991, Discret. Math..
[37] Kyung-Yong Chwa,et al. Optimal Embedding of Multiple Directed Hamiltonian Rings into d-dimensional Meshes , 2000, J. Parallel Distributed Comput..
[38] Gianfranco Bilardi,et al. Broadcast and Associative Operations on Fat-Trees , 1997, Euro-Par.
[39] Robert A. van de Geijn,et al. Optimal Broadcasting in Mesh-Connected Architectures , 1991 .
[40] Bruce M. Maggs,et al. Communication-efficient parallel algorithms for distributed random-access machines , 1988, Algorithmica.
[41] Jung Ho Ahn,et al. HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[42] William Gropp,et al. A Scalable MPI_Comm_split Algorithm for Exascale Computing , 2010, EuroMPI.
[43] Bronis R. de Supinski,et al. Exascale Algorithms for Generalized MPI_Comm_split , 2011, EuroMPI.
[44] Laxmikant V. Kalé,et al. Scaling an optimistic parallel simulation of large-scale interconnection networks , 2005, Proceedings of the Winter Simulation Conference, 2005..