LiteTE: Lightweight, Communication-Efficient Distributed-Memory Triangle Enumerating

Distributed-memory triangle enumerating has attracted considerable interests due to its potential capability to process huge graphs quickly. However, existing algorithms suffer from low speed due to high communication cost and load imbalance. To solve the problems, we propose LiteTE, a lightweight, communication-efficient triangle enumerating scheme. To reduce communication cost, LiteTE proposes several techniques, including a graph partitioning method to fully leverage the large memory of commodity servers and the high bandwidth of modern networks and a fast broadcast algorithm to effectively utilize the bidirectional bandwidth of cables and the aggregate bandwidth of clusters. To reduce load imbalance, LiteTE proposes three-level techniques, including a codesign technique of graph partitioning and partition-level load balance, a decentralized dynamic node-level load balance technique, and a chunk-based lock-free work-stealing technique, all of which are lightweight and incur no or hardly any communication cost. The experimental results show that LiteTE reduces communication cost and load imbalance considerably and achieves much better performance in metrics, such as setup time, runtime, scalability, and load balance than the state-of-the-art algorithms. On a small-scale cluster, LiteTE enumerates the 15 trillion triangles in a graph of 92 billion edges in 15 min, while other algorithms fail to complete.

[1]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Roger Pearce Triangle counting for scale-free graphs at scale in distributed memory , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Ge Yu,et al.  Parallel Triangle Counting over Large Graphs , 2013, DASFAA.

[4]  Dhabaleswar K. Panda,et al.  Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? , 2017, EuroMPI.

[5]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[6]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[7]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[8]  Madhav V. Marathe,et al.  Parallel Algorithms for Counting Triangles in Networks with Large Degrees , 2014, ArXiv.

[9]  Jesper Larsson Träff,et al.  Optimal broadcast for fully connected processor-node networks , 2008, J. Parallel Distributed Comput..

[10]  Stijn Eyerman,et al.  Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  Dmitri Loguinov,et al.  On Efficient External-Memory Triangle Listing , 2019, IEEE Transactions on Knowledge and Data Engineering.

[12]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[13]  Margo I. Seltzer,et al.  A Scalable Distributed Graph Partitioner , 2015, Proc. VLDB Endow..

[14]  Guy E. Blelloch,et al.  Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable , 2018, SPAA.

[15]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[16]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[17]  David A. Patterson,et al.  Latency lags bandwith , 2004, CACM.

[18]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[19]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[20]  Di Xiao,et al.  Improving I/O Complexity of Triangle Enumeration , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[21]  David A. Bader,et al.  Tracking Structure of Streaming Social Networks , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[22]  David Kaeli,et al.  Introduction to Parallel Programming , 2013 .

[23]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[24]  Madhav V. Marathe,et al.  A Space-Efficient Parallel Algorithm for Counting Exact Triangles in Massive Networks , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[25]  Subodh Kumar,et al.  A Parallel TSP-Based Algorithm for Balanced Graph Partitioning , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[26]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[27]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[28]  Eiko Yoneki,et al.  PDTL: Parallel and Distributed Triangle Listing for Massive Graphs , 2015, 2015 44th International Conference on Parallel Processing.

[29]  Pararth Shah,et al.  Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.

[30]  Sung-Hyon Myaeng,et al.  PTE: Enumerating Trillion Triangles On Distributed Systems , 2016, KDD.

[31]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[32]  H. Howie Huang,et al.  TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[34]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..