A new algorithm for essential proteins identification based on the integration of protein complex co-expression information and edge clustering coefficient

Essential proteins provide valuable information for the development of biology and medical research from the system level. The accuracy of topological centrality only based methods is deeply affected by noise in the network. Therefore, exploring efficient methods for identifying essential proteins would be of great value. Using biological features to identify essential proteins is efficient in reducing the noise in PPI network. In this paper, based on the consideration that essential proteins evolve slowly and play a central role within a network, a new algorithm, named CED, is proposed. CED mainly employs gene expression level, protein complex information and edge clustering coefficient to predict essential proteins. The performance of CED is validated based on the yeast Protein-Protein Interaction (PPI) network obtained from DIP database and BioGRID database. The prediction accuracy of CED outperforms other seven algorithms when applied to the two databases.

[1]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[2]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[3]  Hunter B. Fraser,et al.  Modularity and evolutionary constraint on proteins , 2005, Nature Genetics.

[4]  Aleksey Y Ogurtsov,et al.  Bioinformatical assay of human gene morbidity. , 2004, Nucleic acids research.

[5]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[6]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[7]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[8]  Jianzhi Zhang,et al.  Why Do Hubs Tend to Be Essential in Protein Networks? , 2006, PLoS genetics.

[9]  Yi Pan,et al.  Identifying essential proteins via integration of protein interaction and gene expression data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[10]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, RECOMB.

[11]  Huanye Sheng,et al.  Understanding protein evolutionary rate by integrating gene co-expression with protein interactions , 2010, BMC Systems Biology.

[12]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[13]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[14]  Gang Chen,et al.  Identifying the overlapping complexes in protein interaction networks , 2010, Int. J. Data Min. Bioinform..

[15]  H. Mori,et al.  Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection , 2006, Molecular systems biology.

[16]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[17]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[18]  Chung-Yen Lin,et al.  Hubba: hub objects analyzer—a framework of interactome hubs identification for network biology , 2008, Nucleic Acids Res..

[19]  Jia Song,et al.  Identification of conserved protein complexes by module alignment , 2011, Int. J. Data Min. Bioinform..

[20]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.

[21]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[22]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[23]  Ronald W. Davis,et al.  Systematic screen for human disease genes in yeast , 2002, Nature Genetics.

[24]  G. Arndt,et al.  Genome‐wide screening for gene function using RNAi in mammalian cells , 2005, Immunology and cell biology.

[25]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Wooyoung Kim,et al.  Prediction of essential proteins using topological properties in GO-pruned PPI network based on machine learning methods , 2012 .

[27]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[28]  Jin Xu,et al.  A New Method for the Discovery of Essential Proteins , 2013, PloS one.

[29]  Huan Wang,et al.  Prediction of Essential Proteins by Integration of PPI Network Topology and Protein Complexes Information , 2011, ISBRA.

[30]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..