Querying subgraph sets with g-tries

In this paper we present an universal methodology for finding all the occurrences of a given set of subgraphs in one single larger graph. Past approaches would either enumerate all possible subgraphs of a certain size or query a single subgraph. We use g-tries, a data structure specialized in dealing with subgraph sets. G-Tries store the topological information on a tree that exposes common substructure. Using a specialized canonical form and symmetry breaking conditions, a single non-redundant search of the entire set of subgraphs is possible. We give results of applying g-tries querying to different social networks, showing that we can efficiently find the occurrences of a set containing subgraphs of multiple sizes, outperforming previous methods.

[1]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[2]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[4]  Marcus Kaiser,et al.  Strategies for Network Motifs Discovery , 2009, 2009 Fifth IEEE International Conference on e-Science.

[5]  Fernando M. A. Silva,et al.  Efficient Subgraph Frequency Estimation with G-Tries , 2010, WABI.

[6]  Pedro Ribeiro,et al.  Efficient and Scalable Algorithms for Network Motifs Discovery , 2011 .

[7]  Fernando M. A. Silva,et al.  g-tries: an efficient data structure for discovering network motifs , 2010, SAC '10.

[8]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Falk Schreiber,et al.  Towards Motif Detection in Networks: Frequency Concepts and Flexible Search , 2004 .

[10]  Fernando M. A. Silva,et al.  Efficient Parallel Subgraph Counting Using G-Tries , 2010, 2010 IEEE International Conference on Cluster Computing.

[11]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[12]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Padraig Cunningham,et al.  Characterizing Wikipedia pages using edit network motif profiles , 2011, SMUC '11.

[15]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[16]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[18]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[19]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[20]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[21]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Jeannette C. M. Janssen,et al.  Model Selection for Social Networks Using Graphlets , 2012, Internet Math..

[23]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[24]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[25]  Dennis Shasha,et al.  Enhancing Graph Database Indexing by Suffix Tree Structure , 2010, PRIB.