Integration of Genomic Data for Inferring Protein Complexes from Global Protein–Protein Interaction Networks

Protein-protein interactions (PPIs) play crucial roles in virtually every aspect of cellular function within an organism. One important objective of modern biology is the extraction of functional modules, such as protein complexes from global protein interaction networks. This paper describes how seven genomic features and four experimental interaction data sets were combined using a Bayesian-networks-based data integration approach to infer PPI networks in yeast. Greater coverage and higher accuracy were achieved than in previous high-throughput studies of PPI networks in yeast. A Markov clustering algorithm was then used to extract protein complexes from the inferred protein interaction networks. The quality of the computed complexes was evaluated using the hand-curated complexes from the Munich Information Center for Protein Sequences database and gene-ontology-driven semantic similarity. The results indicated that, by integrating multiple genomic information sources, a better clustering result was obtained in terms of both statistical measures and biological relevance.

[1]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[2]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[3]  Jonathan S Weissman,et al.  Multiple Gln/Asn-Rich Prion Domains Confer Susceptibility to Induction of the Yeast [PSI+] Prion , 2001, Cell.

[4]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[5]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[6]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[7]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[8]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[9]  Huiru Zheng,et al.  Predictive Integration of Gene Ontology-Driven Similarity and Functional Interactions , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[11]  S. vanDongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[12]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[14]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[15]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[16]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[17]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[18]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[19]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[20]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[21]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[22]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[23]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[24]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[25]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[26]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[27]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[28]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[29]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[30]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[31]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[32]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[33]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[34]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[35]  X Li,et al.  Amino-terminal protein processing in Saccharomyces cerevisiae is an essential function that requires two distinct methionine aminopeptidases. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[36]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[37]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[38]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[39]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[40]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[41]  Toshihisa Takagi,et al.  Prediction of protein-protein interactions using support vector machines , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[42]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.