Optimizing communication for a 2D-partitioned scalable BFS
暂无分享,去创建一个
Holger Fröning | Jeffrey S. Young | Julian Romera | Matthias Hauck | H. Fröning | Matthias Hauck | J. Romera
[1] Katsuki Fujisawa,et al. Fast and scalable NUMA-based thread parallel breadth-first search , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).
[2] Fabio Checconi,et al. Exploring network optimizations for large-scale graph analytics , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Jonathan Goldstein,et al. Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.
[4] Gang Wang,et al. Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..
[5] Mingyu Chen,et al. Compression and Sieve: Reducing Communication in Parallel Breadth First Search on Distributed Memory Systems , 2012, ArXiv.
[6] Koji Ueno,et al. 2D Partitioning Based Graph Search for the Graph500 Benchmark , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[7] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[8] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[9] Leonid Boytsov,et al. SIMD compression and the intersection of sorted integers , 2014, Softw. Pract. Exp..
[10] Massimo Bernaschi,et al. Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..
[11] Chinya V. Ravishankar,et al. Block-Oriented Compression Techniques for Large Statistical Databases , 1997, IEEE Trans. Knowl. Data Eng..
[12] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Tong Liu,et al. The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.
[14] Nancy M. Amato,et al. Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[15] Julian Romera,et al. Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches , 2017, ArXiv.
[16] M. Delignette-Muller,et al. fitdistrplus: An R Package for Fitting Distributions , 2015 .
[17] M. Żukowski,et al. Balancing vectorized query execution with bandwidth-optimized storage , 2009 .
[18] Massimo Bernaschi,et al. Parallel Distributed Breadth First Search on the Kepler Architecture , 2016, IEEE Transactions on Parallel and Distributed Systems.
[19] Leonid Boytsov,et al. Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..
[20] Daniel Lemire,et al. Vectorized VByte Decoding , 2015, ArXiv.
[21] Fabio Checconi,et al. Massive data analytics: The Graph 500 on IBM Blue Gene/Q , 2013, IBM J. Res. Dev..
[22] Alexander A. Stepanov,et al. SIMD-based decoding of posting lists , 2011, CIKM '11.
[23] Koji Ueno,et al. Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.
[24] Guy E. Blelloch,et al. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.
[25] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[26] Ahmad Afsahi,et al. GPU-Aware Intranode MPI_Allreduce , 2014, EuroMPI/ASIA.
[27] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[28] Massimo Bernaschi,et al. Breadth First Search on APEnet+ , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[29] Koji Ueno,et al. Parallel distributed breadth first search on GPU , 2013, 20th Annual International Conference on High Performance Computing.
[30] Kamesh Madduri,et al. Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[31] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[32] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.