Scalable Motif-aware Graph Clustering

We develop new methods based on graph motifs for graph clustering, allowing more efficient detection of communities within networks. We focus on triangles within graphs, but our techniques extend to other clique motifs as well. Our intuition, which has been suggested but not formalized similarly in previous works, is that triangles are a better signature of community than edges. We therefore generalize the notion of conductance for a graph to triangle conductance, where the edges are weighted according to the number of triangles containing the edge. This methodology allows us to develop variations of several existing clustering techniques, including spectral clustering, that minimize triangles split by the cluster instead of edges cut by the cluster. We provide theoretical results in a planted partition model to demonstrate the potential for triangle conductance in clustering problems. We then show experimentally the effectiveness of our methods to multiple applications in machine learning and graph mining.

[1]  C. McDiarmid Concentration , 1862, The Dental register.

[2]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[3]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[4]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[5]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[6]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[8]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[9]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[10]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[11]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[12]  Srikanta Tirthapura,et al.  Parallel triangle counting in massive streaming graphs , 2013, CIKM.

[13]  Jure Leskovec,et al.  Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[14]  O. Sporns,et al.  Motifs in Brain Networks , 2004, PLoS biology.

[15]  Ulrike von Luxburg,et al.  Cluster Identification in Nearest-Neighbor Graphs , 2007, ALT.

[16]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[17]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[18]  Jon M. Kleinberg,et al.  Network bucket testing , 2011, WWW.

[19]  Robert E. Tarjan,et al.  Finding Strongly Knit Clusters in Social Networks , 2008, Internet Math..

[20]  Jakub W. Pachocki,et al.  Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling , 2015, KDD.

[21]  S. Dongen Graph clustering by flow simulation , 2000 .

[22]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[24]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[26]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[27]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[28]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[29]  Anand Louis,et al.  Hypergraph Markov Operators, Eigenvalues and Approximation Algorithms , 2014, STOC.

[30]  Takeaki Uno,et al.  An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem , 2008, Algorithmica.

[31]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[32]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[33]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[34]  Tamara G. Kolda,et al.  Using Triangles to Improve Community Detection in Directed Networks , 2014, ArXiv.

[35]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[36]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[37]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[38]  Noga Alon,et al.  Better Expanders and Superconcentrators , 1987, J. Algorithms.

[39]  Charalampos E. Tsourakakis,et al.  Dense Subgraph Discovery: KDD 2015 tutorial , 2015, KDD.

[40]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[41]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[42]  Jakub W. Pachocki,et al.  Solving SDD linear systems in nearly mlog1/2n time , 2014, STOC.

[43]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[44]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[45]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[46]  N. Alon,et al.  il , , lsoperimetric Inequalities for Graphs , and Superconcentrators , 1985 .

[47]  Shachar Lovett,et al.  A tail bound for read-k families of functions , 2015, Random Struct. Algorithms.

[48]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[50]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[51]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[52]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[53]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[55]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[56]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[57]  John E. Hopcroft,et al.  On the separability of structural classes of communities , 2012, KDD.