Reducing large graphs to small supergraphs: a unified approach

Summarizing a large graph with a much smaller graph is critical for applications like speeding up intensive graph algorithms and interactive visualization. In this paper, we propose CONditional Diversified Network Summarization (CondeNSe), a Minimum Description Length-based method that summarizes a given graph with approximate “supergraphs” conditioned on a set of diverse, predefined structural patterns. CondeNSe features a unified pattern discovery module and a set of effective summary assembly methods, including a powerful parallel approach, k-Step, that creates high-quality summaries not biased toward specific graph structures. By leveraging CondeNSe ’s ability to efficiently handle overlapping structures, we contribute a novel evaluation of seven existing clustering techniques by going beyond classic cluster quality measures. Extensive empirical evaluation on real networks in terms of compression, runtime, and summary quality shows that CondeNSe finds 30–50% more compact summaries than baselines, with up to 75–90% fewer structures and equally good node coverage.

[1]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[2]  Danai Koutra,et al.  Exploratory Analysis of Graph Data by Leveraging Domain Knowledge , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[3]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[4]  Ravi Kumar,et al.  Preferential behavior in online groups , 2008, WSDM '08.

[5]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[6]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[7]  Danai Koutra,et al.  A Graph Summarization: A Survey , 2016, ArXiv.

[8]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[9]  Danai Koutra,et al.  Scalable Hashing-Based Network Discovery , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[10]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[11]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[12]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[13]  Haoming Shen,et al.  PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs , 2017, Informatics.

[14]  Aristides Gionis,et al.  Sparsification of influence networks , 2011, KDD.

[15]  Danai Koutra,et al.  Edge Labeling Schemes for Graph Data , 2017, SSDBM.

[16]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[17]  Mohammad Al Hasan,et al.  Methods and Applications of Network Sampling , 2016 .

[18]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[19]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[20]  Jennifer Neville,et al.  Network Sampling: Methods and Applications , 2013 .

[21]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2010, PAKDD.

[23]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[24]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[25]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[26]  Lisa Jin,et al.  ECOviz : Comparative Visualization of Time-Evolving Network Summaries , 2017 .

[27]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[28]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[29]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[31]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[32]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[33]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[34]  Christos Faloutsos,et al.  Beyond Blocks: Hyperbolic Community Detection , 2014, ECML/PKDD.

[35]  Shang-Hua Teng,et al.  Spectral sparsification of graphs: theory and algorithms , 2013, CACM.

[36]  Danai Koutra,et al.  Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG Model , 2013, PAKDD.

[37]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[38]  Danai Koutra,et al.  An Empirical Comparison of the Summarization Power of Graph Clustering Methods , 2015, ArXiv.

[39]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[40]  Danai Koutra,et al.  VOG: Summarizing and Understanding Large Graphs , 2014, SDM.

[41]  Danai Koutra,et al.  Individual and Collective Graph Mining: Principles, Algorithms, and Applications , 2017, Individual and Collective Graph Mining.

[42]  P. Hespanha,et al.  An Efficient MATLAB Algorithm for Graph Partitioning , 2006 .

[43]  Dimitrios M. Thilikos,et al.  Evaluating Cooperation in Communities with the k-Core Structure , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[44]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[45]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[46]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[47]  Evimaria Terzi,et al.  GraSS: Graph Structure Summarization , 2010, SDM.

[48]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[49]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..