Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory Architectures

High Bandwidth Memory (HBM) is an additional memory layer between DDR and cache, and it currently exists in the form of Multi-Channel DRAM (MCDRAM) on the Intel Knight's Landing manycore architecture. Its purpose is to increase available memory bandwidth to maximize processor throughput. This work explores optimizing the label propagation community detection algorithm on the KNL, as this algorithm and its variants find broad usage in community detection. This algorithm's processing pattern also represents broader class of vertex-centric programs. As HBM becomes more common in new HPC systems, it is important to determine how best to exploit this memory layer for memory-starved graph and combinatorial algorithms. This work experimentally examines breaking up the algorithmic work into HBM-resident chunks, along with a parametric study of associated variations and optimizations. In general, we find our chunking methodology does not harm solution quality and can improve time to solution for label propagation. We believe these results would likely generalize to other vertex-centric algorithms as well.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  Sivasankaran Rajamanickam,et al.  Complex Network Partitioning Using Label Propagation , 2016, SIAM J. Sci. Comput..

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  Francesco De Pellegrini,et al.  Distributed k-Core Decomposition , 2013 .

[5]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[6]  Hyeran Jeon,et al.  Graph processing on GPUs: Where are the bottlenecks? , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[8]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[9]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[11]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[12]  A. L. Schmidt,et al.  Anatomy of news consumption on Facebook , 2017, Proceedings of the National Academy of Sciences.

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Sivasankaran Rajamanickam,et al.  Partitioning Trillion-Edge Graphs in Minutes , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[15]  Boleslaw K. Szymanski,et al.  Supplemental Methods For: Identifying Robust Communities and Multi-community Nodes by Combining Top-down and Bottom-up Approaches to Clustering , 2022 .

[16]  Sivasankaran Rajamanickam,et al.  Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[17]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  T. Murata,et al.  Advanced modularity-specialized label propagation algorithm for detecting communities in networks , 2009, 0910.1154.

[19]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[20]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[21]  Jeff R. Hammond,et al.  User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies , 2015 .

[22]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[23]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[24]  Simon David Hammond,et al.  memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. , 2015 .

[25]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[26]  Boleslaw K. Szymanski,et al.  LabelRankT: incremental community detection in dynamic networks via label propagation , 2013, DyNetMM '13.

[27]  Cynthia A. Phillips,et al.  Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation , 2015, IPDPS.

[28]  Haixun Wang,et al.  Managing and mining large graphs: systems and implementations , 2012, SIGMOD Conference.

[29]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[30]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.