Graph Processing on GPUs

[1]  Elwood S. Buffa,et al.  Graph Theory with Applications , 1977 .

[2]  Udo Hahn,et al.  Computing text Constituency: An Algorithmic Approach to the Generation of Text Graphs , 1984, SIGIR.

[3]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[4]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[5]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[6]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[9]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[10]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[11]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[12]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[13]  Richard T. Watson,et al.  The centrality and prestige of CACM , 2008, CACM.

[14]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[16]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[18]  P J Narayanan,et al.  Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.

[19]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[20]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[21]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[22]  Gregory E. Chamitoff,et al.  Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station , 2010, Journal of Real-Time Image Processing.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Arutyun Avetisyan,et al.  Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs , 2009, SAMOS.

[25]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[26]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[27]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[28]  Kai Li,et al.  Fidelity and scaling of the PARSEC benchmark inputs , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[29]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[30]  Bin Wu,et al.  Cloud-based Connected Component Algorithm , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[31]  Borko Furht,et al.  Exploring NVIDIA-CUDA for video coding , 2010, MMSys '10.

[32]  Kevin Skadron,et al.  Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.

[33]  P. J. Narayanan,et al.  A fast GPU algorithm for graph connectivity , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[34]  Feng Yan,et al.  Efficient PageRank and SpMV Computation on AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing.

[35]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[36]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.

[37]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[38]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[39]  Rob H. Bisseling,et al.  A GPU Algorithm for Greedy Graph Matching , 2011, Facing the Multicore-Challenge.

[40]  Onur Mutlu,et al.  Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Kurt Keutzer,et al.  clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.

[42]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[43]  Arnon Rungsawang,et al.  Fast PageRank Computation on a GPU Cluster , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[44]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[45]  Nicolas Brunie,et al.  Simultaneous branch and warp interweaving for sustained GPU performance , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[46]  Jared Hoberock,et al.  Edge v. Node Parallelism for Graph Centrality Metrics , 2012 .

[47]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[48]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[49]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[50]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[51]  Jianlong Zhong,et al.  Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[52]  Michela Becchi,et al.  Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[53]  Keshav Pingali,et al.  Morph algorithms on GPUs , 2013, PPoPP '13.

[54]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[55]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[56]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[57]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[58]  Bin Li,et al.  Distributed cooperative localization based on Gaussian message passing on factor graph in wireless networks , 2015, Science China Information Sciences.

[59]  Bingsheng He,et al.  In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..

[60]  Shengen Yan,et al.  yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.

[61]  Zhisong Fu,et al.  MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[62]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[63]  Bing Yang,et al.  BiELL: A bisection ELLPACK-based storage format for optimizing SpMV on GPUs , 2014, J. Parallel Distributed Comput..

[64]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[65]  Jianlong Zhong,et al.  Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[66]  Zhenguo Li,et al.  VENUS: Vertex-centric streamlined graph computation on a single PC , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[67]  Sudipto Guha,et al.  Vertex and Hyperedge Connectivity in Dynamic Graph Streams , 2015, PODS.

[68]  Karsten Schwan,et al.  GraphReduce: processing large-scale graphs on accelerator-based systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[69]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[70]  Kenli Li,et al.  Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.

[71]  Hai Jin,et al.  Optimization of asynchronous graph processing on GPU with hybrid coloring model , 2015, PPoPP.

[72]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[73]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[74]  Jinwook Kim,et al.  GStream: a graph streaming processing method for large-scale graphs on GPUs , 2015, PPoPP.

[75]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[76]  Alexandros G. Dimakis,et al.  FrogWild! - Fast PageRank Approximations on Graph Engines , 2015, Proc. VLDB Endow..

[77]  Ariful Azad,et al.  A Parallel Tree Grafting Algorithm for Maximum Cardinality Matching in Bipartite Graphs , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[78]  Scott McMillan,et al.  GBTL-CUDA: Graph Algorithms and Primitives for GPUs , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[79]  H. Howie Huang,et al.  iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.

[80]  Feng Shi,et al.  Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[81]  Jinwook Kim,et al.  GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs , 2016, SIGMOD Conference.

[82]  Wenguang Chen,et al.  FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[83]  Davide Barbieri,et al.  Sparse Matrix-Vector Multiplication on GPGPUs , 2017, ACM Trans. Math. Softw..