g-tries: an efficient data structure for discovering network motifs

In this paper we propose a novel specialized data structure that we call g-trie, designed to deal with collections of subgraphs. The main conceptual idea is akin to a prefix tree in the sense that we take advantage of common topology by constructing a multiway tree where the descendants of a node share a common substructure. We give algorithms to construct a g-trie, to list all stored subgraphs, and to find occurrences on another graph of the subgraphs stored in the g-trie. We evaluate the implementation of this structure and its associated algorithms on a set of representative benchmark biological networks in order to find network motifs. To assess the efficiency of our algorithms we compare their performance with other known network motif algorithms also implemented in the same common platform. Our results show that indeed, g-tries are a feasible, adequate and very efficient data structure for network motifs discovery, clearly outperforming previous algorithms and data structures.

[1]  Michio Kondoh,et al.  Building trophic modules into a persistent food web , 2008, Proceedings of the National Academy of Sciences.

[2]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[3]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[4]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[5]  O. Sporns,et al.  Motifs in Brain Networks , 2004, PLoS biology.

[6]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[7]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[8]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[9]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[10]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Albert-László Barabási,et al.  Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network , 2004, BMC Bioinformatics.

[18]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[19]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[20]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Uri Alon,et al.  Coarse-graining and self-dissimilarity of complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[23]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[24]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.