Identification of essential proteins via the network topology feature and subcellular localisation

The false positive rate and false negative rate in the biological data have a negative impact on prediction of essential proteins by computational methods. In this work, a new method called CNC is developed to detect essential proteins. First, subcellular localisation information is used to evaluate the importance of interactions in the protein networks and the interactions are weighted for the first time. Meanwhile the edge clustering coefficients between the interacting proteins are calculated and serve as the second weighted value. Next, the two weighted technologies are integrated to construct a new weighted protein networks. Finally, each protein in the PPI networks is scored in terms of the weighted interactions between the protein and its direct neighbours. The results show that the new centrality measure CNC is more effective in discovering essential protein compared with other familiar methods.

[1]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[2]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[3]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[6]  Luo Jiawei,et al.  A Novel Essential Protein Identification Algorithm Based on the Integration of Local Network Topology and Gene Ontology , 2014 .

[7]  Karl W. Broman,et al.  A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: Application to Mycobacterium tuberculosis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. W. Campbell,et al.  Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655 , 2003, Journal of bacteriology.

[9]  Sanjay Kumar,et al.  Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi , 2009, BMC Microbiology.

[10]  Caroline C. Friedel,et al.  Inferring topology from clustering coefficients in protein-protein interaction networks , 2006, BMC Bioinformatics.

[11]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[12]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[13]  Huan Wang,et al.  Prediction of Essential Proteins by Integration of PPI Network Topology and Protein Complexes Information , 2011, ISBRA.

[14]  Ronald W. Davis,et al.  Systematic screen for human disease genes in yeast , 2002, Nature Genetics.

[15]  Matthew W. Hahn,et al.  Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. , 2005, Molecular biology and evolution.

[16]  G. Arndt,et al.  Genome‐wide screening for gene function using RNAi in mammalian cells , 2005, Immunology and cell biology.

[17]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[18]  B. Palsson,et al.  Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation , 2005, BMC Microbiology.

[19]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Joachim Henkel,et al.  Parts, property and sharing , 2009, Nature Biotechnology.

[21]  Chung-Yen Lin,et al.  Hubba: hub objects analyzer—a framework of interactome hubs identification for network biology , 2008, Nucleic Acids Res..

[22]  Nigel M de S Cameron,et al.  Our synthetic future , 2009, Nature Biotechnology.

[23]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[25]  E V Koonin,et al.  How many genes can make a cell: the minimal-gene-set concept. , 2000, Annual review of genomics and human genetics.

[26]  Aleksey Y Ogurtsov,et al.  Bioinformatical assay of human gene morbidity. , 2004, Nucleic acids research.

[27]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[28]  Jianzhi Zhang,et al.  Why Do Hubs Tend to Be Essential in Protein Networks? , 2006, PLoS genetics.

[29]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[30]  Hon Wai Leong,et al.  Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology , 2010, BMC Bioinformatics.

[31]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[32]  Ernesto Estrada Virtual identification of essential proteins within the protein interaction network of yeast , 2005, Proteomics.

[33]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[34]  Elizabeth Pennisi,et al.  Genomics. Synthetic genome brings new life to bacterium. , 2010, Science.

[35]  Yi Pan,et al.  Identification of Essential proteins from Weighted protein-protein Interaction Networks , 2013, J. Bioinform. Comput. Biol..

[36]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[37]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[38]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[39]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[40]  Yi Pan,et al.  An efficient method to identify essential proteins for different species by integrating protein subcellular localization information , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[41]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[42]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[43]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.