An Empirical Comparison of the Summarization Power of Graph Clustering Methods

How do graph clustering techniques compare with respect to their summarization power? How well can they summarize a million-node graph with a few representative structures? Graph clustering or community detection algorithms can summarize a graph in terms of coherent and tightly connected clusters. In this paper, we compare and contrast different techniques: METIS, Louvain, spectral clustering, SlashBurn and KCBC, our proposed k-core-based clustering method. Unlike prior work that focuses on various measures of cluster quality, we use vocabulary structures that often appear in real graphs and the Minimum Description Length (MDL) principle to obtain a graph summary per clustering method. Our main contributions are: (i) Formulation: We propose a summarization-based evaluation of clustering methods. Our method, VOG-OVERLAP, concisely summarizes graphs in terms of their important structures which lead to small edge overlap, and large node/edge coverage; (ii) Algorithm: we introduce KCBC, a graph decomposition technique, in the heart of which lies the k-core algorithm (iii) Evaluation: We compare the summarization power of five clustering techniques on large real graphs, and analyze their compression performance, summary statistics and runtimes.

[1]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[2]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[3]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[4]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[5]  Gerhard Weikum,et al.  A Fresh Look on Knowledge Bases: Distilling Named Events from News , 2014, CIKM.

[6]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[7]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[8]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[9]  Christos Faloutsos,et al.  Beyond Blocks: Hyperbolic Community Detection , 2014, ECML/PKDD.

[10]  Ravi Kumar,et al.  Preferential behavior in online groups , 2008, WSDM '08.

[11]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[12]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..

[13]  P. Hespanha,et al.  An Efficient MATLAB Algorithm for Graph Partitioning , 2006 .

[14]  Jilles Vreeken,et al.  Compression Picks Item Sets That Matter , 2006, PKDD.

[15]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[16]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[17]  Danai Koutra,et al.  VOG: Summarizing and Understanding Large Graphs , 2014, SDM.

[18]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[20]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[21]  Arne Koopman Characteristic relational patterns , 2009, KDD.

[22]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[23]  Stephen P. Borgatti,et al.  Special issue on blockmodels: Introduction , 1992 .

[24]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[25]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[26]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[28]  Dimitrios M. Thilikos,et al.  Evaluating Cooperation in Communities with the k-Core Structure , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[29]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[30]  FaloutsosChristos,et al.  On data mining, compression, and Kolmogorov complexity , 2007 .

[31]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.