An iteration method for identifying yeast essential proteins from heterogeneous network

BackgroundEssential proteins are distinctly important for an organism’s survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods.ResultsIn this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods.ConclusionsWe demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.

[1]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[2]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[3]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Huan Wang,et al.  Prediction of Essential Proteins by Integration of PPI Network Topology and Protein Complexes Information , 2011, ISBRA.

[5]  Fang-Xiang Wu,et al.  United Complex Centrality for Identification of Essential Proteins from PPI Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[7]  Ernesto Estrada Protein bipartivity and essentiality in the yeast protein-protein interaction network. , 2006, Journal of proteome research.

[8]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[9]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[10]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[11]  Xiujuan Lei,et al.  Artificial Fish Swarm Optimization Based Method to Identify Essential Proteins , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Jin Xu,et al.  A New Method for the Discovery of Essential Proteins , 2013, PloS one.

[13]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[14]  Yi Pan,et al.  Prediction of Essential Proteins Based on Overlapping Essential Modules , 2014, IEEE Transactions on NanoBioscience.

[15]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[16]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[17]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[18]  Sanjay Kumar,et al.  Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi , 2009, BMC Microbiology.

[19]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Juan Wu,et al.  A new algorithm for essential proteins identification based on the integration of protein complex co-expression information and edge clustering coefficient , 2015, Int. J. Data Min. Bioinform..

[21]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[22]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[23]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[24]  Xueyong Li,et al.  Essential protein discovery based on a combination of modularity and conservatism. , 2016, Methods.

[25]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[26]  Yi Pan,et al.  UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[28]  Matthew W. Hahn,et al.  Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. , 2005, Molecular biology and evolution.

[29]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[30]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[31]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[32]  Hon Wai Leong,et al.  Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology , 2010, BMC Bioinformatics.