XBFS: eXploring Runtime Optimizations for Breadth-First Search on GPUs
暂无分享,去创建一个
Fan Yao | Hang Liu | Anil Gaihre | Zhenlin Wu | Fan Yao | Hang Liu | Anil Gaihre | Zhenlin Wu
[1] Guy E. Blelloch,et al. Parallelism in Randomized Incremental Algorithms , 2018, J. ACM.
[2] H. Howie Huang,et al. iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.
[3] David A. Bader,et al. Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Laxmi N. Bhuyan,et al. Scalable SIMD-Efficient Graph Processing on GPUs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[6] Keval Vora,et al. CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.
[7] Hang Liu,et al. SIMD-X: Programming and Processing of Graph Algorithms on GPUs , 2018, USENIX Annual Technical Conference.
[8] H. Howie Huang,et al. T ri C ore : parallel triangle counting on GPUs , 2018 .
[9] Michela Becchi,et al. Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[10] H. Howie Huang,et al. High-Performance Triangle Counting on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).
[11] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[12] H. Howie Huang,et al. iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Kamesh Madduri,et al. Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Mark J. Harris,et al. Parallel Prefix Sum (Scan) with CUDA , 2011 .
[15] H. Howie Huang,et al. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching , 2019, SIGMOD Conference.
[16] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[17] Wenguang Chen,et al. Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.
[18] H. Howie Huang,et al. TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Bo Wu,et al. Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] H. Howie Huang,et al. Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] Zhijia Zhao,et al. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing , 2018, ASPLOS.
[22] Dimitrios S. Nikolopoulos,et al. GraphGrind: addressing load imbalance of graph partitioning , 2017, ICS.
[23] H. Howie Huang,et al. Graphene: Fine-Grained IO Management for Graph Computing , 2017, FAST.
[24] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[25] Alexander Tiskin,et al. All-Pairs Shortest Paths Computation in the BSP Model , 2001, ICALP.
[26] Peter Schwabe. Graphics Processing Units , 2014, Secure Smart Embedded Devices, Platforms and Applications.
[27] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[28] Sabu M. Thampi,et al. Survey of Search and Replication Schemes in Unstructured P2p Networks , 2010, Netw. Protoc. Algorithms.
[29] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[30] John D. Owens,et al. Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.
[31] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[32] Keshav Pingali,et al. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations , 2017, PPoPP.
[33] Nancy M. Amato,et al. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.