How threshold behaviour affects the use of subgraphs for network comparison

Motivation: A wealth of protein–protein interaction (PPI) data has recently become available. These data are organized as PPI networks and an efficient and biologically meaningful method to compare such PPI networks is needed. As a first step, we would like to compare observed networks to established network models, under the aspect of small subgraph counts, as these are conjectured to relate to functional modules in the PPI network. We employ the software tool GraphCrunch with the Graphlet Degree Distribution Agreement (GDDA) score to examine the use of such counts for network comparison. Results: Our results show that the GDDA score has a pronounced dependency on the number of edges and vertices of the networks being considered. This should be taken into account when testing the fit of models. We provide a method for assessing the statistical significance of the fit between random graph models and biological networks based on non-parametric tests. Using this method we examine the fit of Erdös–Rényi (ER), ER with fixed degree distribution and geometric (3D) models to PPI networks. Under these rigorous tests none of these models fit to the PPI networks. The GDDA score is not stable in the region of graph density relevant to current PPI networks. We hypothesize that this score instability is due to the networks under consideration having a graph density in the threshold region for the appearance of small subgraphs. This is true for both geometric (3D) and ER random graph models. Such threshold behaviour may be linked to the robustness and efficiency properties of the PPI networks. Contact: tiago@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Stark,et al.  Network motifs: structure does not determine function , 2006, BMC Genomics.

[2]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[3]  Natasa Przulj,et al.  Modelling protein–protein interaction networks via a stickiness index , 2006, Journal of The Royal Society Interface.

[4]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[5]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[6]  B. Bollobás The evolution of random graphs , 1984 .

[7]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[8]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  Béla Bollobás,et al.  Random Graphs , 1985 .

[11]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[12]  Zoran Nenadic,et al.  Structure of brain functional networks , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[13]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[14]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[15]  Mathew D. Penrose,et al.  Random Geometric Graphs , 2003 .

[16]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Tin Wee Tan,et al.  In silico grouping of peptide/HLA class I complexes using structural interaction characteristics , 2007, Bioinform..

[18]  K M Søndergaard,et al.  [Understanding statistics?]. , 1995, Ugeskrift for laeger.

[19]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[20]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[21]  Chaoyang Zhang,et al.  A Fourier Transformation based Method to Mine Peptide Space for Antimicrobial Activity , 2006, BMC Bioinformatics.

[22]  Shigeru Shinomoto,et al.  A Method for Selecting the Bin Size of a Time Histogram , 2007, Neural Computation.

[23]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[24]  Noga Alon,et al.  Biomolecular network motif counting and discovery by color coding , 2008, ISMB.

[25]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[26]  Süleyman Cenk Sahinalp,et al.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution , 2006, Systems Biology and Computational Proteomics.

[27]  Xia Li,et al.  Towards patterns tree of gene coexpression in eukaryotic species , 2008, Bioinform..

[28]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[29]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[30]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[31]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[32]  Concettina Guerra,et al.  A review on models and algorithms for motif discovery in protein-protein interaction networks. , 2008, Briefings in functional genomics & proteomics.

[33]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..