A Novel Method to Predict Essential Proteins Based on Diffusion Distance Networks

Essential proteins are important for the survival and reproduction of organisms. Many computational methods have been proposed to identify essential proteins, due to the production of vast amounts of protein-protein interaction (PPI) data. It has been demonstrated that PPI networks have graph-theoretic characteristics as so-called small-world and scale-free. The traditional metrics cannot really reflect the relationship between proteins when identifying essential proteins from PPI networks. In this paper, we construct a diffusion distance network (DSN) by combining PPI topology characteristics with orthologous proteins and sub-cellular localization information of proteins. Taking the modularity feature of essential proteins into account, we proposed a new essential proteins prediction method based on DSN. We employed our DSN method and ten other state-of-the-art methods to predict essential proteins. The precision-recall curve, jackknife methodology and so on are used to test the performance of these methods. Experimental results show that our method outperform ten other competitive methods. The row data and the software are freely available at: https://github.com/husaiccsu/DSN.

[1]  Bin Liu,et al.  MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks , 2019, Briefings Bioinform..

[2]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[3]  Xiufen Zou,et al.  Predicting Essential Proteins by Integrating Network Topology, Subcellular Localization Information, Gene Expression Profile and GO Annotation Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[5]  Yi Pan,et al.  Prediction of Essential Proteins Based on Overlapping Essential Modules , 2014, IEEE Transactions on NanoBioscience.

[6]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[7]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[8]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[9]  Matthew W. Hahn,et al.  Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. , 2005, Molecular biology and evolution.

[10]  Xiujuan Lei,et al.  Artificial Fish Swarm Optimization Based Method to Identify Essential Proteins , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Jin Xu,et al.  A New Method for the Discovery of Essential Proteins , 2013, PloS one.

[12]  Alpan Raval,et al.  Identifying Hubs in Protein Interaction Networks , 2009, PloS one.

[13]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[14]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[15]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[16]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[17]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[18]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[19]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[20]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[22]  Guanrong Chen,et al.  Complex networks: small-world, scale-free and beyond , 2003 .

[23]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[25]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[26]  Lei Wang,et al.  A Novel Model for Predicting Essential Proteins Based on Heterogeneous Protein-Domain Network , 2020, IEEE Access.

[27]  Wei Dai,et al.  A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein–Protein Interactive Networks , 2019, Genes.

[28]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[29]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[30]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[31]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[32]  Wenkai Li,et al.  Network-based methods for predicting essential genes or proteins: a survey , 2019, Briefings Bioinform..

[33]  Michael F. Cuccarese,et al.  Quantitating drug-target engagement in single cells in vitro and in vivo. , 2017, Nature chemical biology.

[34]  Yi Pan,et al.  An efficient method to identify essential proteins for different species by integrating protein subcellular localization information , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[35]  Yi Pan,et al.  A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.