Compressing Networks with Super Nodes

Community detection is a commonly used technique for identifying groups in a network based on similarities in connectivity patterns. To facilitate community detection in large networks, we recast the network as a smaller network of ‘super nodes’, where each super node comprises one or more nodes of the original network. We can then use this super node representation as the input into standard community detection algorithms. To define the seeds, or centers, of our super nodes, we apply the ‘CoreHD’ ranking, a technique applied in network dismantling and decycling problems. We test our approach through the analysis of two common methods for community detection: modularity maximization with the Louvain algorithm and maximum likelihood optimization for fitting a stochastic block model. Our results highlight that applying community detection to the compressed network of super nodes is significantly faster while successfully producing partitions that are more aligned with the local network connectivity and more stable across multiple (stochastic) runs within and between community detection algorithms, yet still overlap well with the results obtained using the full network.

[1]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Bryan S. Graham,et al.  Network Data , 2019, Handbook of Econometrics.

[5]  Xiaochun Cao,et al.  Improving the Efficiency and Effectiveness of Community Detection via Prior-Induced Equivalent Super-Network , 2017, Scientific Reports.

[6]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[7]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Anna C. Gilbert,et al.  Compressing Network Graphs , 2004 .

[12]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[13]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[14]  Danai Koutra,et al.  Reducing large graphs to small supergraphs: a unified approach , 2018, Social Network Analysis and Mining.

[15]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[16]  Tamara G. Kolda,et al.  Accelerating Community Detection by Using K-core Subgraphs , 2014, ArXiv.

[17]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[18]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lenka Zdeborová,et al.  Fast and simple decycling and dismantling of networks , 2016, Scientific Reports.

[21]  Danai Koutra,et al.  Graph Summarization Methods and Applications , 2016, ACM Comput. Surv..

[22]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[23]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[24]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[25]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Manuel Llinás,et al.  Supergenomic Network Compression and the Discovery of EXP1 as a Glutathione Transferase Inhibited by Artesunate , 2014, Cell.

[27]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[28]  Arnaud Browet,et al.  Community Detection for Hierarchical Image Segmentation , 2011, IWCIA.