Reconstruction of the Protein-Protein Interaction Network for Protein Complexes Identification by Walking on the Protein Pair Fingerprints Similarity Network

Identifying protein complexes from protein-protein interaction networks (PPINs) is important to understand the science of cellular organization and function. However, PPINs produced by high-throughput studies have high false discovery rate and only represent snapshot interaction information. Reconstructing higher quality PPINs is essential for protein complex identification. Here we present a Multi-Level PPINs reconstruction (MLPR) method for protein complexes detection. From existing PPINs, we generated full combinations of every two proteins. These protein pairs are represented as a vector which includes six different sources. Then the protein pairs with same vector are mapped to the same fingerprint ID. A fingerprint similarity network is constructed next, in which a vertex represents a protein pair fingerprint ID and each vertex is connected to its top 10 similar fingerprints by edges. After random walking on the fingerprints similarity network, each vertex got a score at the steady state. According to the score of protein pairs, we considered the top ranked ones as reliable PPI and the score as the weight of edge between two distinct proteins. Finally, we expanded clusters starting from seeded vertexes based on the new weighted reliable PPINs. Applying our method on the yeast PPINs, our algorithm achieved higher F-value in protein complexes detection than the-state-of-the-art methods. The interactions in our reconstructed PPI network have more significant biological relevance than the exiting PPI datasets, assessed by gene ontology. In addition, the performance of existing popular protein complexes detection methods are significantly improved on our reconstructed network.

[1]  See-Kiong Ng,et al.  InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes , 2003, Nucleic Acids Res..

[2]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[3]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[6]  Bo Xu,et al.  Ontology integration to identify protein complex in protein interaction networks , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[8]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[9]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[10]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[11]  Fang-Xiang Wu,et al.  Identifying protein complexes and functional modules - from static PPI networks to dynamic PPI networks , 2014, Briefings Bioinform..

[12]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[13]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[14]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[15]  Hongfei Lin,et al.  Protein Complex Identification by Integrating Protein-Protein Interaction Evidence from Multiple Sources , 2013, PloS one.

[16]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[17]  Mong-Li Lee,et al.  Increasing confidence of protein-protein interactomes. , 2006, Genome informatics. International Conference on Genome Informatics.

[18]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[19]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[20]  Xiangrong Liu,et al.  An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction , 2016 .

[21]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[22]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[23]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[24]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[25]  Devin K. Schweppe,et al.  Architecture of the human interactome defines protein communities and disease networks , 2017, Nature.

[26]  See-Kiong Ng,et al.  Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method , 2006, BMC Bioinformatics.

[27]  Greg W. Clark,et al.  Panorama of ancient metazoan macromolecular complexes , 2015, Nature.

[28]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[29]  Sourav S. Bhowmick,et al.  Clustering and Summarizing Protein-Protein Interaction Networks: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[30]  Hui Lu,et al.  Correlation between gene expression profiles and protein-protein interactions within and across genomes , 2005, Bioinform..

[31]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[32]  S. Dongen Graph clustering by flow simulation , 2000 .

[33]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[34]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[35]  Dong-Soo Han,et al.  Protein complex prediction based on simultaneous protein interaction network , 2010, Bioinform..

[36]  Tao Jiang,et al.  A Max-Flow-Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.