Assessing Significance of Connectivity and Conservation in Protein Interaction Networks

Computational and comparative analysis of protein-protein interaction (PPI) networks enable understanding of the modular organization of the cell through identification of functional modules and protein complexes. These analysis techniques generally rely on topological features such as connectedness, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work in our lab, and in other labs has focused on efficient algorithms for identification of modules and their conservation. Application of these methods to a variety of networks has yielded novel biological insights. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools such as BLAST and CLUSTAL. One critical component of this infrastructure is a measure of the statistical significance of a match or a dense subcomponent. Corresponding sequence-based measures such as E-values are key components of sequence matching tools. In the absence of an analytical measure, conventional methods rely on computer simulations based on ad-hoc models for quantifying significance. This paper presents the first such effort, to the best of our knowledge, aimed at analytically quantifying statistical significance of dense components and matches in reference model graphs. We consider two reference graph models – a G(n,p) model in which each pair of nodes has an identical likelihood, p, of sharing an edge, and a two-level G(n,p) model, which accounts for high-degree hub nodes generally occurring in PPI networks. We argue that by choosing conservatively the value of p, the G(n,p) model will dominate that of the power-law graph that is often used to model PPI networks. We also propose a method for evaluating statistical significance based on the results derived from this analysis, and demonstrate the use of these measures for assessing significant structures in PPI networks. Experiments performed on a rich collection of PPI networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks.

[1]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[2]  Wojciech Szpankowski,et al.  Detecting Conserved Interaction Patterns in Biological Networks , 2006, J. Comput. Biol..

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[5]  Roded Sharan,et al.  Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data , 2005, J. Comput. Biol..

[6]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  C. Cannings,et al.  On the structure of protein-protein interaction networks. , 2003, Biochemical Society transactions.

[9]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[10]  Béla Bollobás,et al.  Random Graphs , 1985 .

[11]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[12]  M S Waterman,et al.  Rapid and accurate estimates of statistical significance for sequence data base searches. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[13]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[14]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[15]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[16]  W. Neupert,et al.  The Isolated Complex of the Translocase of the Outer Membrane of Mitochondria , 1998, Journal of Biological Chemistry.

[17]  Wojciech Szpankowski,et al.  Average Case Analysis of Algorithms on Sequences: Szpankowski/Average , 2001 .

[18]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[19]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[20]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[21]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[22]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Roded Sharan,et al.  Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data , 2004, J. Comput. Biol..

[24]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[25]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[26]  Antonio del Sol,et al.  Topology of small-world networks of protein?Cprotein complex structures , 2005, Bioinform..

[27]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[28]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[30]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[31]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[32]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[33]  Natasa Przulj,et al.  Graph Theory Analysis of Protein–Protein Interactions , 2005 .

[34]  R. Milo,et al.  Subgraphs in random networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[36]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[37]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[38]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[39]  Wojciech Szpankowski,et al.  Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution , 2005, RECOMB.

[40]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[41]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[43]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[44]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[45]  Fan Chung Graham,et al.  A Random Graph Model for Power Law Graphs , 2001, Exp. Math..

[46]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[47]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .

[48]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[49]  Jeffrey Mark Siskind,et al.  Image Segmentation with Ratio Cut , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[51]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.