Work efficient parallel algorithms for large graph exploration

Graph algorithms play a prominent role in several fields of sciences and engineering. Notable among them are graph traversal, finding the connected components of a graph, and computing shortest paths. There are several efficient implementations of the above problems on a variety of modern multiprocessor architectures. It can be noticed in recent times that the size of the graphs that correspond to real world data sets has been increasing. Parallelism offers only a limited succor to this situation as current parallel architectures have severe short-comings when deployed for most graph algorithms. At the same time, these graphs are also getting very sparse in nature. This calls for particular work efficient solutions aimed at processing large, sparse graphs on modern parallel architectures. In this paper, we introduce graph pruning as a technique that aims to reduce the size of the graph. Certain elements of the graph can be pruned depending on the nature of the computation. Once a solution is obtained for the pruned graph, the solution is extended to the entire graph. We apply the above technique on three fundamental graph algorithms: breadth first search (BFS), Connected Components (CC), and All Pairs Shortest Paths (APSP). To validate our technique, we implement our algorithms on a heterogeneous platform consisting of a multicore CPU and a GPU. On this platform, we achieve an average of 35% improvement compared to state-ofthe-art solutions. Such an improvement has the potential to speed up other applications that rely on these algorithms.

[1]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[2]  Kunle Olukotun,et al.  On fast parallel detection of strongly connected components (SCC) in small-world graphs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Zheng Wei,et al.  Optimization of linked list prefix computations on multithreaded GPUs using CUDA , 2010, IPDPS.

[4]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[7]  John Greiner,et al.  A comparison of parallel algorithms for connected components , 1994, SPAA '94.

[8]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.

[9]  Krishna P. Gummadi,et al.  Growth of the flickr social network , 2008, WOSN '08.

[10]  Gary L. Miller,et al.  A Simple Randomized Parallel Algorithm for List-Ranking , 1990, Inf. Process. Lett..

[11]  Matei Ripeanu,et al.  On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[13]  Mattia D'Emidio,et al.  A Speed-Up Technique for Distributed Shortest Paths Computation , 2011, ICCSA.

[14]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[15]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[16]  Fabrizio Petrini,et al.  Efficient Breadth-First Search on the Cell/BE Processor , 2008, IEEE Transactions on Parallel and Distributed Systems.

[17]  David A. Bader,et al.  A fast, parallel spanning tree algorithm for symmetric multiprocessors , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[18]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[19]  David A. Bader,et al.  An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs) , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[20]  P. J. Narayanan,et al.  Some GPU Algorithms for Graph Connected Components and Spanning Tree , 2010, Parallel Process. Lett..

[21]  Pradeep Dubey,et al.  Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[22]  Dominique Lavenier,et al.  Efficient Multi-GPU Computation of All-Pairs Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[23]  Wei-keng Liao,et al.  Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs , 2012, WAW.

[24]  Paulius Micikevicius,et al.  General Parallel Computation on Commodity Graphics Hardware: Case Study with the All-Pairs Shortest Paths Problem , 2004, PDPTA.

[25]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[26]  Kishore Kothapalli,et al.  Hybrid algorithms for list ranking and graph connected components , 2011, 2011 18th International Conference on High Performance Computing.

[27]  Fabrizio Petrini,et al.  Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[28]  Kishore Kothapalli,et al.  Work efficient parallel algorithms for large graph exploration , 2013, HiPC.

[29]  David A. Bader,et al.  Task-based parallel breadth-first search in heterogeneous environments , 2012, 2012 19th International Conference on High Performance Computing.

[30]  Zheng Wei,et al.  Optimization of linked list prefix computations on multithreaded GPUs using CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[31]  Jack Dongarra,et al.  ScaLAPACK user's guide , 1997 .

[32]  Yinglong Xia TOPOLOGICALLY ADAPTIVE PARALLEL BREADTH-FIRST SEARCH ON MULTICORE PROCESSORS , 2010 .

[33]  Jan van Leeuwen,et al.  Worst-case Analysis of Set Union Algorithms , 1984, JACM.

[34]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[35]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[36]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[37]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[38]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[39]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[40]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[42]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[43]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[44]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[45]  Sartaj Sahni,et al.  A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.

[46]  Stanislav G. Sedukhin,et al.  Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[47]  SahniSartaj,et al.  A blocked all-pairs shortest-paths algorithm , 2003 .

[48]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[49]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.