GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters
暂无分享,去创建一个
[1] Tong Liu,et al. The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.
[2] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[3] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[4] John D. Owens,et al. Distributed texture memory in a multi-GPU environment , 2006, GH '06.
[5] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[6] Wu-chun Feng,et al. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[7] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[9] John D. Owens,et al. Message passing on data-parallel architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[10] Holger Fröning,et al. Efficient hardware support for the Partitioned Global Address Space , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[11] Dhabaleswar K. Panda,et al. Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[12] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[13] Yutaka Ishikawa,et al. Direct MPI Library for Intel Xeon Phi Co-Processors , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[14] Holger Fröning,et al. Highly scalable barriers for future high-performance computing clusters , 2011, 2011 18th International Conference on High Performance Computing.
[15] Massimo Bernaschi,et al. GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.