Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM
暂无分享,去创建一个
[1] D. Panda,et al. Extending OpenSHMEM for GPU Computing , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[2] Wu-chun Feng,et al. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[3] Satoshi Matsuoka. Making TSUBAME2.0, the world's greenest production supercomputer, even greener — Challenges to the architects , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.
[4] Vijay Saraswat,et al. GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.
[5] Hiroki Honda,et al. FLAT: a GPU programming framework to provide embedded MPI , 2012, GPGPU-5.
[6] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[7] Koji Ueno,et al. Parallel distributed breadth first search on GPU , 2013, 20th Annual International Conference on High Performance Computing.
[8] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[9] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[10] Duncan Poole,et al. Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems , 2015, OpenSHMEM.
[11] Massimo Bernaschi,et al. Parallel Distributed Breadth First Search on the Kepler Architecture , 2016, IEEE Transactions on Parallel and Distributed Systems.