Delta Send-Recv for Dynamic Pipelining in MPI Programs
暂无分享,去创建一个
Chen Ding | Yaoqing Gao | Roch Archambault | Bin Bao | Bin Bao | C. Ding | Yaoqing Gao | R. Archambault
[1] D. Martin Swany,et al. MPI-aware compiler optimizations for improving communication-computation overlap , 2009, ICS.
[2] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[3] Keith D. Cooper,et al. Engineering a Compiler , 2003 .
[4] Martin Burtscher,et al. Tolerating Message Latency Through the Early Release of Blocked Receives , 2005, Euro-Par.
[5] J.C. Sancho,et al. Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-Scale Scientific Applications , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[6] K. Timson,et al. Center for research on parallel computation , 1992 .
[7] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[8] Martin Schulz,et al. Exploiting Data Similarity to Reduce Memory Footprints , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[9] Paul D. Hovland,et al. Data-Flow Analysis for MPI Programs , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[10] Jack J. Dongarra,et al. Overlapping Computation and Communication for Advection on Hybrid Parallel Computers , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[11] Eric P. Chassignet,et al. North Atlantic Simulations with the Hybrid Coordinate Ocean Model (HYCOM): Impact of the Vertical Coordinate Choice, Reference Pressure, and Thermobaricity , 2003 .
[12] Ken Kennedy,et al. A balanced code placement framework , 2000, TOPL.
[13] Ken Kennedy,et al. Advanced optimization strategies in the Rice dHPF compiler , 2002, Concurr. Comput. Pract. Exp..
[14] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[15] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[16] Yuan Zhang,et al. Barrier matching for programs with textually unaligned barriers , 2007, PPoPP.
[17] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[18] George Bosilca,et al. TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology , 2004, PVM/MPI.
[19] Jesper Larsson Träff,et al. A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems , 2008, PVM/MPI.
[20] Michael Wolfe,et al. A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.
[21] Martin Schulz,et al. Transforming MPI source code based on communication patterns , 2010, Future Gener. Comput. Syst..
[22] Torsten Hoefler,et al. Non-Blocking Collective Operations for MPI-2 , 2006 .
[23] Costin Iancu,et al. HUNTing the overlap , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[24] Greg Bronevetsky,et al. Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.
[25] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..