Evaluating the performance of the allreduce collective operation on clusters. Approach and results
暂无分享,去创建一个
[1] Scott Pakin,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.
[2] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[3] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[4] Jeffrey S. Vetter,et al. Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[5] Vaidy S. Sunderam,et al. PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..
[6] Jack J. Dongarra,et al. Review of Performance Analysis Tools for MPI Parallel Programs , 2001, PVM/MPI.
[7] Sanjeev Kumar,et al. Evaluating synchronization on shared address space multiprocessors: methodology and performance , 1999, SIGMETRICS '99.
[8] Terry Jones,et al. Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[9] James C. Hoe,et al. MPI-StarT: Delivering Network Performance to Numerical Applications , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[10] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[11] J. L. Traff. Implementing the MPI Process Topology Mechanism , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[12] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[13] Jeffrey S. Vetter,et al. An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[14] Nicholas Carriero,et al. Linda in context , 1989, CACM.
[15] Massimo Bernaschi,et al. Collective communication operations: experimental results vs. theory , 1998, Concurr. Pract. Exp..
[16] Tao Yang,et al. Optimizing threaded MPI execution on SMP clusters , 2001, ICS '01.
[17] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.
[18] John Markus Bjørndalen,et al. EventSpace - Exposing and Observing Communication Behavior of Parallel Cluster Applications , 2003, Euro-Par.
[19] Xin Yuan,et al. CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters , 2003, PPoPP '03.
[20] Bruce Lowekamp,et al. ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.
[21] William E. Johnston,et al. The NetLogger methodology for high performance distributed systems performance analysis , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).
[22] Darryl Veitch,et al. PC based precision timing without GPS , 2002, SIGMETRICS '02.
[23] Dennis W. Duke,et al. Proceedings of the 1998 ACM/IEEE conference on Supercomputing , 1998 .
[24] Otto J. Anshus,et al. Configurable Collective Communication in LAM-MPI , 2002 .
[25] Steve Sistare,et al. Optimization of MPI Collectives on Clusters of Large-Scale SMP's , 1999, SC.
[26] Dhabaleswar K. Panda,et al. Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[27] Brian Vinter,et al. The Performance of Configurable Collective Communication for LAM-MPI in Clusters and Multi-Clusters , 2002 .
[28] Brian Vinter,et al. PATHS - Integrating the Principles of Method-Combination and Remote Procedure Calls for Run-Time Configuration and Tuning of High-Performance Distributed Applications YYYY No org found YYY , 2001 .
[29] Brian Vinter,et al. Java PastSet: a structured distributed shared memory system , 2003, IEE Proc. Softw..