Runtime detection and optimization of collective communication patterns
暂无分享,去创建一个
[1] Sandia Report,et al. The Portals 4.0 Message Passing Interface , 2008 .
[2] Martin Schulz,et al. Detecting Patterns in MPI Communication Traces , 2008, 2008 37th International Conference on Parallel Processing.
[3] Zhaofang Wen,et al. Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization , 2000 .
[4] Katherine Yelick,et al. UPC Language Specifications V1.1.1 , 2003 .
[5] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[6] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[7] Jesper Larsson Träff,et al. Two-tree algorithms for full bandwidth broadcast, reduction and scan , 2009, Parallel Comput..
[8] Larry Kaplan,et al. The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[9] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[10] Michael L. Scott,et al. Fast, contention-free combining tree barriers for shared-memory multiprocessors , 1994, International Journal of Parallel Programming.
[11] Martin Schulz,et al. Transforming MPI source code based on communication patterns , 2010, Future Gener. Comput. Syst..
[12] Greg Bronevetsky,et al. Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.
[13] P. Strevens. Iii , 1985 .
[14] Torsten Hoefler,et al. Communication-centric optimizations by dynamically detecting collective operations , 2012, PPoPP '12.
[15] Torsten Hoefler,et al. The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[16] Tarek A. El-Ghazawi,et al. An evaluation of global address space languages: co-array fortran and unified parallel C , 2005, PPoPP.
[17] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[18] Jesper Larsson Träff,et al. Self-Consistent MPI Performance Guidelines , 2010, IEEE Transactions on Parallel and Distributed Systems.
[19] Dieter Kranzlmüller,et al. Detection of Collective MPI Operation Patterns , 2004, PVM/MPI.
[20] Martin Schulz,et al. A Scalable and Distributed Dynamic Formal Verifier for MPI Programs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Hans-Wolfgang Loidl,et al. Semi-Explicit Parallel Programming in a Purely Functional Style: GpH , 2009 .
[22] References , 1971 .
[23] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[24] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[25] Bradford L. Chamberlain,et al. The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..
[26] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[27] Philip Heidelberger,et al. The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.
[28] J. Dongarra. Performance of various computers using standard linear equations software , 1990, CARN.
[29] Martin Schulz,et al. Using MPI Communication Patterns to Guide Source Code Transformations , 2008, ICCS.
[30] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[31] William N. Scherer,et al. A new vision for coarray Fortran , 2009, PGAS '09.
[32] Steve Poole,et al. ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[33] Keith D. Underwood,et al. Fine-Grained Message Pipelining for Improved MPI Performance , 2006, 2006 IEEE International Conference on Cluster Computing.
[34] Amitabha Sanyal,et al. Data Flow Analysis - Theory and Practice , 2009 .
[35] Scott Pakin. Receiver-initiated message passing over RDMA Networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.