Deploying Graph Algorithms on GPUs: An Adaptive Solution
暂无分享,去创建一个
Michela Becchi | Da Li | M. Becchi | Da Li
[1] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[2] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[3] Lubos Brim,et al. Computing Strongly Connected Components in Parallel on CUDA , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[4] Olga Kurasova,et al. Parallel Bidirectional Dijkstra's Shortest Path Algorithm , 2010, DB&IS.
[5] David A. Bader,et al. Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs , 2004, IPDPS.
[6] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[7] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[8] David A. Bader,et al. Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Frank Dehne,et al. Practical parallel algorithms for minimum spanning trees , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).
[10] David A. Bader,et al. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[11] Nancy M. Amato,et al. STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.
[12] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[13] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.
[14] Keshav Pingali,et al. A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.
[15] Douglas P. Gregor,et al. The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .
[16] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[17] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[18] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[19] Keshav Pingali,et al. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms , 2011, PPoPP '11.
[20] Lawrence Rauchwerger,et al. Finding strongly connected components in parallel in particle transport sweeps , 2001, SPAA '01.
[21] Warren Schudy,et al. Finding strongly connected components in parallel using O(log2n) reachability queries , 2008, SPAA '08.
[22] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[23] Kurt Mehlhorn,et al. A Parallelization of Dijkstra's Shortest Path Algorithm , 1998, MFCS.
[24] Keshav Pingali,et al. Optimistic parallelism requires abstractions , 2009, CACM.
[25] Donald B. Johnson,et al. A parallel algorithm for computing minimum spanning trees , 1992, SPAA '92.
[26] Václav Koubek,et al. Parallel algorithms for connected components in a graph , 1985, FCT.
[27] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).