CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations

BackgroundEffectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems.ResultsIn this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained.We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall.ConclusionCPredictor3.0 can serve as a promising tool of protein complex prediction.

[1]  Yang Wang,et al.  Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions , 2017, BMC Bioinformatics.

[2]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  S. Di Tommaso,et al.  Extensive analysis of D-J-C arrangements allows the identification of different mechanisms enhancing the diversity in sheep T cell receptor β-chain repertoire , 2010, BMC Genomics.

[5]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[6]  Bin Xu,et al.  From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on Protein-to-Protein Interaction Networks , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Osamu Maruyama,et al.  PPSampler2: Predicting protein complexes more accurately and efficiently by sampling , 2013, BMC Systems Biology.

[9]  Jiawei Luo,et al.  A cell-core-attachment approach for identifying protein complexes in PPI network , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[10]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[11]  Fei Liu,et al.  Exploration of charge states of balanol analogues acting as ATP-competitive inhibitors in kinases , 2017, BMC Bioinformatics.

[12]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[13]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[14]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[15]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[16]  Xiufen Zou,et al.  A New Method for Detecting Protein Complexes based on the Three Node Cliques , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Limsoon Wong,et al.  Discovery of small protein complexes from PPI networks with size-specific supervised weighting , 2014, BMC Systems Biology.

[18]  Miriam Baglioni,et al.  Protein complex prediction for large protein protein interaction networks with the Core&Peel method , 2016, BMC Bioinformatics.

[19]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[20]  Lusheng Wang,et al.  Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Keith C. C. Chan,et al.  Utilizing Both Topological and Attribute Information for Protein Complex Identification in PPI Networks , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[23]  Xiao-Fei Zhang,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[25]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[26]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[27]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[28]  Yuanhua Liu,et al.  Multilevel omic data integration in cancer cell lines: advanced annotation and emergent properties , 2013, BMC Systems Biology.

[29]  Shuigeng Zhou,et al.  Group and Graph Joint Sparsity for Linked Data Classification , 2016, AAAI.

[30]  Yi Pan,et al.  A comparison of the functional modules identified from time course and static PPI network data , 2011, BMC Bioinformatics.

[31]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[32]  Ying Xu Reviewer appreciation editorial , 2014, TCBB.

[33]  Witold Pedrycz,et al.  Protein complex identification through Markov clustering with firefly algorithm on dynamic protein-protein interaction networks , 2016, Inf. Sci..

[34]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[35]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[36]  Yang Wang,et al.  An effective approach to detecting both small and large complexes from protein-protein interaction networks , 2017, BMC Bioinformatics.

[37]  Gianni Cesareni,et al.  WI‐PHI: A weighted yeast interactome enriched for direct physical interactions , 2007, Proteomics.

[38]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[39]  See-Kiong Ng,et al.  Interaction graph mining for protein complexes using local clique merging. , 2005, Genome informatics. International Conference on Genome Informatics.

[40]  Cheng Liang,et al.  MOEPGA: A novel method to detect protein complexes in yeast protein-protein interaction networks based on MultiObjective Evolutionary Programming Genetic Algorithm , 2015, Comput. Biol. Chem..

[41]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[42]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[43]  Yi Yang,et al.  Diversified Temporal Subgraph Pattern Mining , 2016, KDD.

[44]  Yijia Zhang,et al.  A method for predicting protein complex in dynamic PPI networks , 2016, BMC Bioinformatics.

[45]  Zelmina Lubovac,et al.  Combining functional and topological properties to identify core modules in protein interaction networks , 2006, Proteins.

[46]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[47]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[48]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[49]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[50]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[51]  Bertil Schmidt,et al.  Accelerating metagenomic read classification on CUDA-enabled GPUs , 2017, BMC Bioinformatics.