The Cost of Synchronizing Imbalanced Processes in Message Passing Systems
暂无分享,去创建一个
[1] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[2] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[3] Susan Coghlan,et al. Benchmarking the effects of operating system interference on extreme-scale parallel machines , 2008, Cluster Computing.
[4] Scott Pakin,et al. Unresponsiveness -Tolerant Collective Communication , 2001 .
[5] Rajiv Gupta,et al. A scalable implementation of barrier synchronization using an adaptive combining tree , 1990, International Journal of Parallel Programming.
[6] Scott Pakin,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.
[7] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[8] Erwin Laure,et al. Energetic particles in magnetotail reconnection , 2014, Journal of Plasma Physics.
[9] Torsten Hoefler,et al. A practical approach to the rating of barrier algorithms using the LogP model and Open MPI , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).
[10] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[11] Erwin Laure,et al. Idle waves in high-performance computing. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.
[12] Torsten Hoefler,et al. Fast barrier synchronization for InfiniBand/spl trade/ , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[13] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.
[14] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[15] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Susan Coghlan,et al. The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.
[17] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[18] Debra Hensgen,et al. Two algorithms for barrier synchronization , 1988, International Journal of Parallel Programming.
[19] Torsten Hoefler,et al. Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Robert A. van de Geijn,et al. Global combine on mesh architectures with wormhole routing , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[21] Nisheeth K. Vishnoi,et al. The Impact of Noise on the Scaling of Collectives: A Theoretical Approach , 2005, HiPC.
[22] Ron Brightwell,et al. Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, Hot Interconnects.
[24] Eugene D. Brooks,et al. The butterfly barrier , 1986, International Journal of Parallel Programming.
[25] Jack J. Dongarra,et al. MPI Collective Algorithm Selection and Quadtree Encoding , 2006, PVM/MPI.