Condensed Graphs: A Generic Framework for Accelerating Subgraph Census Computation

Determining subgraph frequencies is at the core of several graph mining methodologies such as discovering network motifs or computing graphlet degree distributions. Current state-of-the-art algorithms for this task either take advantage of common patterns emerging on the networks or target a set of specific subgraphs for which analytical calculations are feasible. Here, we propose a novel network generic framework revolving around a new data-structure, a Condensed Graph, that combines both the aforementioned approaches, but generalized to support any subgraph topology and size. Furthermore, our methodology can use as a baseline any enumeration based census algorithm, speeding up its computation. We target simple topologies that allow us to skip several redundant and heavy computational steps using combinatorics. We were are able to achieve substantial improvements, with evidence of exponential speedup for our best cases, where these patterns represent up to 97% of the network, from a broad set of real and synthetic networks.

[1]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[2]  Pedro Manuel Pinto Ribeiro,et al.  Towards a faster network-centric subgraph census , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[3]  Pedro Ribeiro,et al.  A Survey on Subgraph Counting , 2019, ACM Comput. Surv..

[4]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[5]  Yong Gao The degree distribution of random k-trees , 2009, Theor. Comput. Sci..

[6]  Ina Koch,et al.  QuateXelero: An Accelerated Exact Network Motif Detection Algorithm , 2013, PloS one.

[7]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  B. McKay nauty User ’ s Guide ( Version 2 . 4 ) , 1990 .

[9]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[11]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[14]  Janez Demsar,et al.  Combinatorial algorithm for counting small induced graphs and orbits , 2017, PloS one.

[15]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[16]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[17]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[18]  Fernando M. A. Silva,et al.  G-Tries: a data structure for storing and finding subgraphs , 2014, Data Mining and Knowledge Discovery.