论文信息 - A Distributed Multi-GPU System for Fast Graph Processing

A Distributed Multi-GPU System for Fast Graph Processing

We present Lux, a distributed multi-GPU system that achieves fast graph processing by exploiting the aggregate memory bandwidth of multiple GPUs and taking advantage of locality in the memory hierarchy of multi-GPU clusters. Lux provides two execution models that optimize algorithmic efficiency and enable important GPU optimizations, respectively. Lux also uses a novel dynamic load balancing strategy that is cheap and achieves good load balance across GPUs. In addition, we present a performance model that quantitatively predicts the execution times and automatically selects the runtime configurations for Lux applications. Experiments show that Lux achieves up to 20× speedup over state-of-the-art shared memory systems and up to two orders of magnitude speedup over distributed systems. PVLDB Reference Format: Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. A Distributed Multi-GPU System for Fast Graph Processing. PVLDB, 11(3): 297 310, 2017. DOI: 10.14778/3157794.3157799

[1] Leonard M. Freeman,et al. A set of measures of centrality based upon betweenness , 1977 .

[2] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[3] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.

[4] James Bennett,et al. The Netflix Prize , 2007 .

[5] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[6] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[7] Christos Faloutsos,et al. Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[8] Marco Rosa,et al. Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[9] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[10] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[11] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.

[12] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[13] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[14] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Aurélien Esnard,et al. Dynamic load-balancing with variable number of processors based on graph repartitioning , 2012, 2012 19th International Conference on High Performance Computing.

[16] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[17] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[18] Lin Ma,et al. PAGE: a partition aware graph computation engine , 2013, CIKM.

[19] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.

[20] Alvin AuYoung,et al. Presto: distributed machine learning and graph processing with sparse matrices , 2013, EuroSys '13.

[21] Keval Vora,et al. CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[22] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[23] Chang Zhou,et al. MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing , 2014, Proc. VLDB Endow..

[24] Alexander Aiken,et al. Realm: An event-based low-level runtime for distributed memory architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[25] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.

[26] Zhisong Fu,et al. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[27] Reynold Xin,et al. GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[28] Jianlong Zhong,et al. Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[29] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.

[30] John D. Owens,et al. Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[31] Haibo Chen,et al. NUMA-aware graph-structured analytics , 2015, PPoPP.

[32] Sungpack Hong,et al. PGX.D: a fast distributed graph processing engine , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33] Feifei Li,et al. Graph Analytics Through Fine-Grained Parallelism , 2016, SIGMOD Conference.

[34] Jinwook Kim,et al. GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs , 2016, SIGMOD Conference.

[35] Keshav Pingali,et al. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations , 2017, PPoPP.