Biomolecular network motif counting and discovery by color coding

Protein–protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k≤ 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k≥ 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the ‘color coding’ technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G′ with k vertices in a network G with n vertices in time polynomial with n, provided k=O(log n). We use our algorithm to obtain ‘treelet’ distributions for k≤ 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the ‘duplication model’ but are quite different from that of the ‘preferential attachment model’. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%. Contact:cenk@cs.sfu.ca

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[3]  Fan Chung Graham,et al.  A Random Graph Model for Power Law Graphs , 2001, Exp. Math..

[4]  Colin Cooper,et al.  The degree distribution of the generalized duplication model , 2006, Theor. Comput. Sci..

[5]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[6]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[7]  E. Levanon,et al.  Preferential attachment in the protein network evolution. , 2003, Physical review letters.

[8]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[9]  Fan Chung Graham,et al.  Duplication Models for Biological Networks , 2002, J. Comput. Biol..

[10]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[11]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[12]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[13]  Noga Alon,et al.  Balanced families of perfect hash functions and their applications , 2007, TALG.

[14]  Béla Bollobás,et al.  The degree sequence of a scale‐free random graph process , 2001, Random Struct. Algorithms.

[15]  Noga Alon,et al.  Color-coding , 1995, JACM.

[16]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[17]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2005, RECOMB.

[18]  Süleyman Cenk Sahinalp,et al.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution , 2006, Systems Biology and Computational Proteomics.

[19]  Venkatesh Raman,et al.  Approximation Algorithms for Some Parameterized Counting Problems , 2002, ISAAC.

[20]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[21]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[22]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..

[23]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[24]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[25]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[26]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[27]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).