A Brief Survey of Algorithms, Architectures, and Challenges toward Extreme-scale Graph Analytics

The notion of networks is inherent in the structure, function and behavior of the natural and engineered world that surround us. Consequently, graph models and methods have assumed a prominent role to play in this modern era of Big Data, and are taking a center stage in the discovery pipelines of various data-driven scientific domains. In this paper, we present a brief review of the state-of-the-art in parallel graph analytics, particularly focusing on iterative graph algorithms and their implementation on modern day multicore/manycore architectures. The class of iterative graph algorithms covers a broad class of graph operations of varying complexities, from simpler routines such as Breadth-First Search (BFS), to polynomially-solvable problems such as shortest path computations, to NP-Hard problems such as community detection and graph coloring. We cover a set of common algorithmic abstractions used in implementing such iterative graph algorithms, state the challenges around parallelization on contemporary parallel platforms (including commodity multicores and emerging manycore platforms), and describe a set of approaches that have led to efficient implementations. We also report on advances in manycore architectural frameworks that have found application in parallel graph analytics. We conclude the paper identifying potential research directions, opportunities, and challenges that lay ahead in the path toward enabling graph analytics at exascale.

[1]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[2]  Alberto Ros,et al.  Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs , 2009, APPT.

[3]  Anantharaman Kalyanaraman,et al.  Detecting Communities in Biological Bipartite Networks , 2016, BCB.

[4]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[5]  Fabio Checconi,et al.  Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[6]  Antonino Tumeo,et al.  Scalable static and dynamic community detection using Grappolo , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[8]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[9]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[10]  David A. Bader Graph partitioning and graph clustering : 10th DIMACS Implementation Challenge Workshop, February 13-14, 2012, Georgia Institute of Technology, Atlanta, GA , 2013 .

[11]  Keshav Pingali,et al.  Parallel graph analytics , 2016, Commun. ACM.

[12]  Hao Lu,et al.  Algorithms for Balanced Graph Colorings with Applications in Parallel Computing , 2017, IEEE Transactions on Parallel and Distributed Systems.

[13]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[14]  A. H. Sherman,et al.  Comparative Analysis of the Cuthill–McKee and the Reverse Cuthill–McKee Ordering Algorithms for Sparse Matrices , 1976 .

[15]  Partha Pratim Pande,et al.  Accelerating graph community detection with approximate updates via an energy-efficient NoC , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[17]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[18]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[19]  Manuel E. Acacio,et al.  Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[20]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[21]  Roger Pearce Triangle counting for scale-free graphs at scale in distributed memory , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[22]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[23]  Omer Subasi,et al.  Approximate Computing Techniques for Iterative Graph Algorithms , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[24]  Simon D. Hammond,et al.  Fast linear algebra-based triangle counting with KokkosKernels , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[25]  Anantharaman Kalyanaraman,et al.  Parallel Heuristics for Scalable Community Detection , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[26]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[27]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[28]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[29]  Hao Lu,et al.  High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics , 2016, ACM Trans. Embed. Comput. Syst..

[30]  Keshav Pingali,et al.  Parallel triangle counting and k-truss identification using graph-centric methods , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[31]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[32]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[33]  Hao Lu,et al.  Distributed Louvain Algorithm for Graph Community Detection , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[34]  Jianlong Zhong,et al.  Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[35]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[36]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[37]  Li-Shiuan Peh,et al.  Breaking the on-chip latency barrier using SMART , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[38]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[39]  Partha Pratim Pande,et al.  Enabling High-Performance SMART NoC Architectures Using On-Chip Wireless Links , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  Maya Wardeh,et al.  Database of host-pathogen and related species interactions, and their global distribution , 2015, Scientific Data.

[41]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.