Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks
暂无分享,去创建一个
[1] Richard Cole,et al. The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.
[2] Gregory F. Pfister,et al. “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.
[3] Richard P. Martin,et al. LogP Performance Assessment of Fast Network Interfaces , 1995 .
[4] Michael L. Scott,et al. Synchronization without contention , 1991, ASPLOS IV.
[5] Eugene D. Brooks,et al. The butterfly barrier , 1986, International Journal of Parallel Programming.
[6] Yossi Matias,et al. Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.
[7] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[8] Andrew A. Chien,et al. Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming , 1999, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.
[9] Dhabaleswar K. Panda,et al. Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[10] Bruce M. Maggs,et al. Communication-efficient parallel algorithms for distributed random-access machines , 1988, Algorithmica.
[11] M. O'Keefe,et al. Performance Analysis of Hardware Barrier Synchronization , 1989 .
[12] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[13] Debra Hensgen,et al. Two algorithms for barrier synchronization , 1988, International Journal of Parallel Programming.
[14] John B. Andrew,et al. Notification and Multicast Networks for Synchronization and Coherence , 1992, J. Parallel Distributed Comput..
[15] Henry G. Dietz,et al. Purdue’s Adapter for Parallel Execution and Rapid Synchronization: The TTL_PAPERS Design , 1995 .
[16] Fumihiko Ino,et al. LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.
[17] Welf Löwe,et al. Upper time bounds for executing PRAM-programs on the LogP-machine , 1995, ICS '95.
[18] Alok Aggarwal,et al. On communication latency in PRAM computations , 1989, SPAA '89.
[19] Susanne E. Hambrusch,et al. C/sup 3/: an architecture-independent model for coarse-grained parallel machines , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.
[20] Henry G. Dietz,et al. A fine-grain parallel architecture based on barrier synchronization , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[21] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[22] Nian-Feng Tzeng,et al. Distributed shared memory systems with improved barrier synchronization and data transfer , 1997, ICS '97.
[23] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[24] Dirk Grunwald,et al. Efficient barriers for distributed shared memory computers , 1994, Proceedings of 8th International Parallel Processing Symposium.
[25] Henry G. Dietz,et al. Dynamic Barrier Architecture for Multi-Mode Fine-Grain Parallelism Using Conventional Processors , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[26] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[27] Bruce M. Maggs,et al. Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Models of Parallel Computation: A Survey and Synthesis , 2022 .
[28] Dhabaleswar K. Panda. Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[29] Steven Fortune,et al. Parallelism in random access machines , 1978, STOC.
[30] Michael L. Scott,et al. Fast, contention-free combining tree barriers for shared-memory multiprocessors , 1994, International Journal of Parallel Programming.
[31] Constantine D. Polychronopoulos,et al. Broadcast Networks for Fast Synchronization , 1991, ICPP.
[32] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[33] Allan Gottlieb,et al. Process coordination with fetch-and-increment , 1991, ASPLOS IV.
[34] John von Neumann,et al. First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.
[35] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[36] Jeffrey C. Lagarias,et al. Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..
[37] Dhabaleswar K. Panda. Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms , 1995, Future Gener. Comput. Syst..
[38] Amith R. Mamidala,et al. Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[39] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[40] Csaba Andras Moritz,et al. LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..
[41] Kurt Mehlhorn,et al. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories , 1984, Acta Informatica.
[42] Ralph Grishman,et al. The NYU ultracomputer—designing a MIMD, shared-memory parallel machine , 2018, ISCA '98.
[43] 张思学. 电脑硬件知识一:CPU(Central Processing Unit) , 2005 .
[44] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[45] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.
[46] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.
[47] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .
[48] Phillip B. Gibbons. A more practical PRAM model , 1989, SPAA '89.
[49] Dhabaleswar K. Panda,et al. A reliable hardware barrier synchronization scheme , 1997, Proceedings 11th International Parallel Processing Symposium.
[50] Jeffrey M. Squyres,et al. The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms* , 2005 .
[51] Dhabaleswar K. Panda,et al. Efficient barrier using remote memory operations on VIA-based clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.
[52] Luiz Angelo Steffenel,et al. Fast Tuning of Intra-cluster Collective Communications , 2004, PVM/MPI.
[53] James R. Larus,et al. CICO: A Practical Shared-Memory Programming Performance Model , 1994 .
[54] Nian-Feng Tzeng,et al. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.
[55] Susanne E. Hambrusch. Models for Parallel Computation , 1996, ICPP Workshop.
[56] Dhabaleswar K. Panda,et al. Design and implementation of MPICH2 over InfiniBand with RDMA support , 2003, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[57] Richard M. Karp,et al. Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[58] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[59] Corporate The MPI Forum,et al. MPI: a message passing interface , 1993, Supercomputing '93.
[60] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[61] Anant Agarwal,et al. Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..