Large Scale Graph Representations for Subgraph Census

A Subgraph Census determining the frequency of smaller subgraphs in a network is an important computational task at the heart of several graph mining algorithms. Here we focus on the g-tries, an efficient state-of-the art data structure. Its algorithm makes extensive use of the graph primitive that checks if a certain edge exists. The original implementation used adjacency matrices in order to make this operation as fast as possible, as is the case with most past approaches. This representation is very expensive in memory usage, limiting the applicability. In this paper we study a number of possible approaches that scale linearly with the number of edges. We make an extensive empirical study of these alternatives in order to find an efficient hybrid approach that combines the best representations. We achieve a performance that is less than $$50\,\%$$50% slower than the adjacency matrix on average almost 3 times more efficient than a naive binary search implementation, while being memory efficient and tunable for different memory restrictions.

[1]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[2]  Katarzyna Musial,et al.  Local Topology of Social Network Based on Motif Analysis , 2008, KES.

[3]  Fernando M. A. Silva,et al.  Discovering Colored Network Motifs , 2014, CompleNet.

[4]  Matthew Richardson,et al.  Trust Management for the Semantic Web , 2003, SEMWEB.

[5]  Arne Andersson,et al.  Dynamic Interpolation Search in o(log log n) Time , 1993, ICALP.

[6]  Pedro Manuel Pinto Ribeiro,et al.  Rand-FaSE: fast approximate subgraph census , 2015, Social Network Analysis and Mining.

[7]  Fernando M. A. Silva,et al.  G-Tries: a data structure for storing and finding subgraphs , 2014, Data Mining and Knowledge Discovery.

[8]  Fernando M. A. Silva,et al.  Efficient Parallel Subgraph Counting Using G-Tries , 2010, 2010 IEEE International Conference on Cluster Computing.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[11]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.

[12]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[13]  Marcus Kaiser,et al.  Strategies for Network Motifs Discovery , 2009, 2009 Fifth IEEE International Conference on e-Science.

[14]  Albert-László Barabási,et al.  Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network , 2004, BMC Bioinformatics.

[15]  Fernando M. A. Silva,et al.  Parallel discovery of network motifs , 2012, J. Parallel Distributed Comput..

[16]  Ina Koch,et al.  QuateXelero: An Accelerated Exact Network Motif Detection Algorithm , 2013, PloS one.

[17]  Fernando M. A. Silva,et al.  Efficient Subgraph Frequency Estimation with G-Tries , 2010, WABI.

[18]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[20]  Michael A. Bender,et al.  Don't Thrash: How to Cache Your Hash on Flash , 2011, Proc. VLDB Endow..

[21]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[22]  Pedro Manuel Pinto Ribeiro,et al.  Towards a faster network-centric subgraph census , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[23]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[24]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[25]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[26]  O. Sporns,et al.  Motifs in Brain Networks , 2004, PLoS biology.

[27]  Nicholas Nethercote,et al.  "Building Workload Characterization Tools with Valgrind" , 2006, 2006 IEEE International Symposium on Workload Characterization.

[28]  Fernando M. A. Silva,et al.  Parallel Subgraph Counting for Multicore Architectures , 2014, 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications.