Network Embedding the Protein–Protein Interaction Network for Human Essential Genes Identification

Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein–protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes’ sequence information or genes’ centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.

[1]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[2]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[3]  Yang Zhang,et al.  WDL‐RF: predicting bioactivities of ligand molecules acting with G protein‐coupled receptors by combining weighted deep learning and random forest , 2018, Bioinform..

[4]  S. Shiu,et al.  Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes[OPEN] , 2015, Plant Cell.

[5]  S. Brunak,et al.  A scored human protein–protein interaction network to catalyze genomic interpretation , 2017, Nature Methods.

[6]  Alpan Raval,et al.  Identifying Hubs in Protein Interaction Networks , 2009, PloS one.

[7]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[9]  Andrew Fraser,et al.  Essential Human Genes. , 2015, Cell systems.

[10]  ChinKhew-Voon Logistic regression for disease classification using microarray data , 2007 .

[11]  A. Clatworthy,et al.  Targeting virulence: a new paradigm for antimicrobial therapy , 2007, Nature Chemical Biology.

[12]  Minzhu Xie,et al.  XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction , 2018, IEEE Transactions on NanoBioscience.

[13]  Li Zhao,et al.  Training Set Selection for the Prediction of Essential Genes , 2014, PloS one.

[14]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[15]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[16]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[17]  Ke Zhang,et al.  Network representation based on the joint learning of three feature views , 2019, Big Data Min. Anal..

[18]  Yi Pan,et al.  UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[20]  Yi Pan,et al.  CytoNCA: A cytoscape plugin for centrality analysis and evaluation of protein interaction networks , 2015, Biosyst..

[21]  WangJianxin,et al.  Predicting essential proteins based on weighted degree centrality , 2014 .

[22]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[23]  Wei Peng,et al.  An Entropy-Based Method for Identifying Mutual Exclusive Driver Genes in Cancer , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[25]  Dong Xu,et al.  Understanding protein dispensability through machine-learning analysis of high-throughput data , 2005, Bioinform..

[26]  Wei Dai,et al.  A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein–Protein Interactive Networks , 2019, Genes.

[27]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[28]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[29]  D. Durocher,et al.  High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities , 2015, Cell.

[30]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[31]  Wei Dai,et al.  Identifying Human Essential Genes by Network Embedding Protein-Protein Interaction Network , 2019, ISBRA.

[32]  Feng Xu,et al.  A Brief Review of Network Embedding , 2019, Big Data Min. Anal..

[33]  Jianfeng Xu,et al.  Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data , 2012, Bioinform..

[34]  E. Lander,et al.  Identification and characterization of essential genes in the human genome , 2015, Science.

[35]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.

[36]  Feng Wang,et al.  A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph , 2019, BMC Bioinformatics.

[37]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[38]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[39]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[40]  Lusheng Wang,et al.  Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Fang-Xiang Wu,et al.  Computational approaches to predicting essential proteins: A survey , 2013, Proteomics. Clinical applications.

[44]  Yi Pan,et al.  Essential Proteins Discovery from Weighted Protein Interaction Networks , 2010, ISBRA.

[45]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[46]  G. Arndt,et al.  Genome‐wide screening for gene function using RNAi in mammalian cells , 2005, Immunology and cell biology.

[47]  Hao Luo,et al.  Accurate prediction of human essential genes using only nucleotide composition and association information , 2016, bioRxiv.

[48]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.