Protein complex identification based on weighted PPI network with multi-source information.

Proteins form complexes to accomplish biological functions such as transcription of DNA, translation of mRNA and cell growth. Detection of protein complexes from protein-protein interaction (PPI) networks is the first step for the analysis of biological processes and pathways. Here, we propose a new framework by incorporating Gene Ontology (GO), amino acid background frequency (AABF) and data from von Mering (von Mering data) to identify protein complexes. Firstly, based on the semantic similarity of GO, we construct a weighted PPI network. Secondly, von Mering data is added to construct six types of weighted graphs. Lastly, by integrating density, diameter and cosine similarity, we define a new condition for clustering proteins in these weighted protein network by selecting specific node as key node. Comparison and analysis results indicate that our proposed method could achieve better performances than some classic existing approaches in regard to f-measure and precision.

[1]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[2]  Jian Wang,et al.  Protein complex detection in PPI networks based on data integration and supervised learning method , 2015, BMC Bioinformatics.

[3]  Bo Song,et al.  Combining Sequence and Gene Ontology for Protein Module Detection in the Weighted Network. , 2017, Journal of theoretical biology.

[4]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[5]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[6]  S. Thiagalingam,et al.  A cascade of modules of a network defines cancer progression. , 2006, Cancer research.

[7]  Christos Faloutsos,et al.  Tools for large graph mining , 2005 .

[8]  Tao Jiang,et al.  A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[9]  Aidong Zhang,et al.  CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions , 2008, BMC Bioinformatics.

[10]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[11]  Xiaolong Wang,et al.  Detecting Protein Complexes Based on Sequence Information in the Weighted Protein–Protein Interaction Network , 2012 .

[12]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[13]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[14]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[15]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[18]  Michael Schroeder,et al.  Triangle network motifs predict complexes by complementing high-error interactomes with structural information , 2009, BMC Bioinformatics.

[19]  Saeed Jalili,et al.  PCD-GED: Protein complex detection considering PPI dynamics based on time series gene expression data. , 2015, Journal of theoretical biology.

[20]  Bo Song,et al.  Protein Complex Detection Based on Integrated Strategy , 2015 .

[21]  Daisuke Kihara,et al.  Protein-protein docking using region-based 3D Zernike descriptors , 2009, BMC Bioinformatics.

[22]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[23]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..

[24]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[25]  Hiroyuki Kurata,et al.  Diffusion Model Based Spectral Clustering for Protein-Protein Interaction Networks , 2010, PloS one.

[26]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[27]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[28]  Bo Xu,et al.  The impact of protein interaction networks' characteristics on computational complex detection methods. , 2018, Journal of theoretical biology.

[29]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[30]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[31]  Xuan Wang,et al.  A supervised approach to detect protein complex by combining biological and topological properties , 2013, Int. J. Data Min. Bioinform..

[32]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[33]  Shu-Bo Zhang,et al.  Protein-protein interaction inference based on semantic similarity of Gene Ontology terms. , 2016, Journal of theoretical biology.

[34]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[35]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[36]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[37]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.