Efficient Graphlet Counting for Large Networks

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with millions of nodes and edges, which impedes the application of graphlets to new problems that require large-scale network analysis. To address these problems, we propose a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithms leverages a number of proven combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of 300+ networks from a variety of domains, our graphlet counting strategies are on average 460x faster than current methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date as well as the largest systematic investigation on over 300+ networks from a variety of domains.

[1]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Wayne Hayes,et al.  Optimal Network Alignment with Graphlet Degree Vectors , 2010, Cancer informatics.

[3]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[4]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[5]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[6]  Mohammad Al Hasan,et al.  GUISE: Uniform Sampling of Graphlets for Large Graph Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[7]  P. Stockmeyer,et al.  On Reconstruction of Matrices , 1971 .

[8]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[9]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[10]  Yuval Shavitt,et al.  RAGE - A rapid graphlet enumerator for large networks , 2012, Comput. Networks.

[11]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[12]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[13]  Ove Frank,et al.  Triad count statistics , 1988, Discret. Math..

[14]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[15]  Natasa Przulj,et al.  Graphlet-based measures are suitable for biological network comparison , 2013, Bioinform..

[16]  David Hales,et al.  Motifs in evolving cooperative networks look like protein structure networks , 2008, Networks Heterog. Media.

[17]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[18]  Yuval Shavitt,et al.  Automatic Large Scale Generation of Internet PoP Level Maps , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[19]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[20]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[21]  Mark S. Granovetter T H E S T R E N G T H O F WEAK TIES: A NETWORK THEORY REVISITED , 1983 .

[22]  Katherine Faust,et al.  A puzzle concerning triads in social networks: Graph constraints and the triad census , 2010, Soc. Networks.

[23]  Dieter Kratsch,et al.  Finding and Counting Small Induced Subgraphs Efficiently , 1995, WG.

[24]  Charalambos A. Charalambides,et al.  Enumerative combinatorics , 2018, SIGA.

[25]  P. Kelly A congruence theorem for trees. , 1957 .

[26]  Jonathan L. Gross,et al.  Handbook of Graph Theory, Second Edition , 2013 .

[27]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[28]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[29]  L. Brun,et al.  Graph kernels in chemoinformatics , 2015 .

[30]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[31]  Xiao Liu,et al.  Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Yuval Shavitt,et al.  Approximating the Number of Network Motifs , 2009, Internet Math..

[33]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[34]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[35]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[36]  Brendan D. McKay,et al.  Small graphs are reconstructible , 1997, Australas. J Comb..

[37]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[38]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[39]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.