Measuring Boundedness for Protein Complex Identification in PPI Networks

The problem of identifying protein complexes in Protein-Protein Interaction (PPI) networks is usually formulated as the problem of identifying dense regions in such networks. In this paper, we present a novel approach, called TBPCI, to identify protein complexes based instead on the concept of a measure of boundedness. Such a measure is defined as an objective function of a Jaccard Index-based connectedness measure which takes into consideration how much two proteins within a network are connected to each other, and an association measure which takes into consideration how much two connecting proteins are associated based on their attributes found in the Gene Ontology database. Based on the above two measures, the objective function is derived to capture how strong the proteins can be considered as bounded together and the objective value is therefore referred as the aggregated degree of boundedness. To identify protein complexes, TBPCI computes the degree of boundedness between all possible pairwise proteins. Then, TBPCI uses a Breadth-First-Search method to determine whether a protein-pair should be incorporated into the same complex. TBPCI has been tested with several real data sets and the experimental results show it is an effective approach for identifying protein complexes in PPI networks.

[1]  Andrzej Kloczkowski,et al.  Functional clustering of yeast proteins from the protein-protein interaction network , 2006, BMC Bioinformatics.

[2]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[3]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[4]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[5]  Rachael P. Huntley,et al.  The Gene Ontology Annotation (GOA) Database , 2009 .

[6]  Keith C. C. Chan,et al.  Utilizing Both Topological and Attribute Information for Protein Complex Identification in PPI Networks , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[8]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[9]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[10]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[11]  Dao-Qing Dai,et al.  Detecting overlapping protein complexes based on a generative model with functional and topological properties , 2014, BMC Bioinformatics.

[12]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[13]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[14]  Andrew K. C. Wong,et al.  Learning sequential patterns for probabilistic inductive prediction , 1994 .

[15]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[16]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[17]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[18]  Keith C. C. Chan,et al.  Discovering Functional Interdependence Relationship in PPI Networks for Protein Complex Identification , 2012, IEEE Transactions on Biomedical Engineering.

[19]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[20]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[21]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[22]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[23]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[25]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[26]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[27]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Lawrence Hunter,et al.  Extracting Molecular Binding Relationships from Biomedical Text , 2000, ANLP.

[31]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[32]  Guimei Liu,et al.  Decomposing PPI networks for complex discovery , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[33]  S. Haberman The Analysis of Residuals in Cross-Classified Tables , 1973 .

[34]  Weiping Wang,et al.  Mining protein complexes from PPI networks using the minimum vertex cut , 2012 .

[35]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[36]  Yi Pan,et al.  Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data , 2012, BMC Bioinformatics.

[37]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[38]  Yi Pan,et al.  C-DEVA: Detection, evaluation, visualization and annotation of clusters from biological networks , 2016, Biosyst..

[39]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[40]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[41]  Yi Pan,et al.  Identifying Protein Complexes From Interactome Based on Essential Proteins and Local Fitness Method , 2012, IEEE Transactions on NanoBioscience.

[42]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[43]  Dao-Qing Dai,et al.  Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  S. Dongen A cluster algorithm for graphs , 2000 .

[45]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[46]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.