Assessing Significance of Connectivity and Conservation in Protein Interaction Networks

Comparative analyses of cellular interaction networks enable understanding of the cell's modular organization through identification of functional modules and complexes. These techniques often rely on topological features such as connectedness and density, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work has focused on efficient algorithms for identification of such functional modules and their conservation. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools. One critical, and as yet unresolved aspect of this infrastructure is a measure of the statistical significance of a match, or a dense subcomponent. In the absence of analytical measures, conventional methods rely on computationally expensive simulations based on ad-hoc models for quantifying significance. In this paper, we present techniques for analytically quantifying statistical significance of dense components in reference model graphs. We consider two reference models--a G(n, p) model in which each pair of nodes in a graph has an identical likelihood, p, of sharing an edge, and a two-level G(n, p) model, which accounts for high-degree hub nodes generally observed in interaction networks. Experiments performed on a rich collection of protein interaction (PPI) networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks. We also adapt existing state-of-the-art network clustering algorithms by using our statistical significance measure as an optimization criterion. Comparison of the resulting module identification algorithm, SIDES, with existing methods shows that SIDES outperforms existing algorithms in terms of sensitivity and specificity of identified clusters with respect to available GO annotations.

[1]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[2]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[3]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  C. Cannings,et al.  On the structure of protein-protein interaction networks. , 2003, Biochemical Society transactions.

[5]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[6]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Roded Sharan,et al.  Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data , 2004, J. Comput. Biol..

[8]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[9]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[10]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[11]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[12]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[13]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[14]  Jeffrey Mark Siskind,et al.  Image Segmentation with Ratio Cut , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[16]  Wojciech Szpankowski,et al.  Detecting Conserved Interaction Patterns in Biological Networks , 2006, J. Comput. Biol..

[17]  Antonio del Sol,et al.  Topology of small-world networks of protein?Cprotein complex structures , 2005, Bioinform..

[18]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[22]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[23]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[24]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[25]  R. Milo,et al.  Subgraphs in random networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[27]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  M S Waterman,et al.  Rapid and accurate estimates of statistical significance for sequence data base searches. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[31]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[33]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[34]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.