Structure-preserving sparsification methods for social networks

Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (Local Degree) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. To assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20 % of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.

[1]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[4]  Ulrik Brandes,et al.  Untangling Hairballs - From 3 to 14 Degrees of Separation , 2014, GD.

[5]  Ilya Safro,et al.  Algebraic Distance on Graphs , 2011, SIAM J. Sci. Comput..

[6]  Peter Sanders,et al.  Better Approximation of Betweenness Centrality , 2008, ALENEX.

[7]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[8]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[9]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[10]  Egor V. Kostylev,et al.  Classification of annotation semirings over containment of conjunctive queries , 2014, ACM Trans. Database Syst..

[11]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[12]  G. Simmel The sociology of Georg Simmel , 1950 .

[13]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[14]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[15]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[16]  Zan Huang,et al.  Sampling Large-scale Social Networks: Insights from Simulated Networks , 2008 .

[17]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[18]  Ulrik Brandes,et al.  Simmelian backbones: Amplifying hidden homophily in Facebook networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  Ulrik Brandes,et al.  Triangle Listing Algorithms: Back from the Diversion , 2014, ALENEX.

[21]  Marián Boguñá,et al.  Approximating PageRank from In-Degree , 2007, WAW.

[22]  Christian Staudt,et al.  Engineering Parallel Algorithms for Community Detection in Massive Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[23]  Ilya Safro,et al.  Single- and Multi-level Network Sparsification by Algebraic Distance , 2016, J. Complex Networks.

[24]  Christos Faloutsos,et al.  Epidemic spreading in real networks: an eigenvalue viewpoint , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[25]  M. Cugmas,et al.  On comparing partitions , 2015 .

[26]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[27]  M. Keeling,et al.  Modeling Infectious Diseases in Humans and Animals , 2007 .

[28]  Lucas Antiqueira,et al.  Analyzing and modeling real-world phenomena with complex networks: a survey of applications , 2007, 0711.3199.

[29]  Yufei Tao,et al.  I/O-Efficient Algorithms on Triangle Listing and Counting , 2014, ACM Trans. Database Syst..

[30]  Dorothea Wagner,et al.  Structure-preserving sparsification of social networks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[31]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[32]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[33]  Christian Staudt,et al.  NetworKit: A tool suite for large-scale complex network analysis , 2014, Network Science.

[34]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[35]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[36]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[37]  Andrea Marino,et al.  Fast diameter and radius BFS-based computation in (weakly connected) real-world graphs: With an application to the six degrees of separation games , 2015, Theor. Comput. Sci..

[38]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[39]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[40]  Huzefa Rangwala,et al.  Sparsification and Sampling of Networks for Collective Classification , 2013, SBP.

[41]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Shang-Hua Teng,et al.  Spectral sparsification of graphs: theory and algorithms , 2013, CACM.

[43]  Maria A. Kazandjieva,et al.  A high-resolution human contact network for infectious disease transmission , 2010, Proceedings of the National Academy of Sciences.