Utilizing Both Topological and Attribute Information for Protein Complex Identification in PPI Networks

Many computational approaches developed to identify protein complexes in protein-protein interaction (PPI) networks perform their tasks based only on network topologies. The attributes of the proteins in the networks are usually ignored. As protein attributes within a complex may also be related to each other, we have developed a PCIA algorithm to take into consideration both such information and network topology in the identification process of protein complexes. Given a PPI network, PCIA first finds information about the attributes of the proteins in a PPI network in the Gene Ontology databases and uses such information for the identification of protein complexes. PCIA then computes a Degree of Association measure for each pair of interacting proteins to quantitatively determine how much their attribute values associate with each other. Based on this association measure, PCIA is able to discover dense graph clusters consisting of proteins whose attribute values are significantly closer associated with each other. PCIA has been tested with real data and experimental results seem to indicate that attributes of the proteins in the same complex do have some association with each other and, therefore, that protein complexes can be more accurately identified when protein attributes are taken into consideration.

[1]  J. Peters The anaphase promoting complex/cyclosome: a machine designed to destroy , 2006, Nature Reviews Molecular Cell Biology.

[2]  Dao-Qing Dai,et al.  Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[4]  Thomas S. Deisboeck,et al.  Complex systems science in biomedicine , 2006 .

[5]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[8]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[9]  Yi Pan,et al.  Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data , 2012, BMC Bioinformatics.

[10]  S. Dongen A cluster algorithm for graphs , 2000 .

[11]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[12]  M. Klussmann,et al.  By Mass Spectrometry , 2012 .

[13]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Andrew K. C. Wong,et al.  Learning sequential patterns for probabilistic inductive prediction , 1994 .

[15]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[16]  Bo Xu,et al.  Ontology integration to identify protein complex in protein interaction networks , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[18]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[19]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[21]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[22]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[23]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[24]  Yi Pan,et al.  Identifying Protein Complexes From Interactome Based on Essential Proteins and Local Fitness Method , 2012, IEEE Transactions on NanoBioscience.

[25]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[26]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, J. Comput. Biol..

[27]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[28]  S. Dongen Graph clustering by flow simulation , 2000 .

[29]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[30]  Keith C. C. Chan,et al.  Discovering Functional Interdependence Relationship in PPI Networks for Protein Complex Identification , 2012, IEEE Transactions on Biomedical Engineering.

[31]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[32]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[33]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[34]  Weiping Wang,et al.  Mining protein complexes from PPI networks using the minimum vertex cut , 2012 .

[35]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[36]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[37]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[38]  Guimei Liu,et al.  Decomposing PPI networks for complex discovery , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[39]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[40]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[41]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[42]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[43]  Yijia Zhang,et al.  Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks , 2012, Proteome Science.

[44]  S. Haberman The Analysis of Residuals in Cross-Classified Tables , 1973 .

[45]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[46]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..