Detecting Essential Proteins Based on Network Topology, Gene Expression Data, and Gene Ontology Information

The identification of essential proteins in protein-protein interaction PPI networks is of great significance for understanding cellular processes. With the increasing availability of large-scale PPI data, numerous centrality measures based on network topology have been proposed to detect essential proteins from PPI networks. However, most of the current approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology annotation information. In this paper, we propose a novel centrality measure, called TEO, for identifying essential proteins by combining network topology, gene expression profiles, and GO information. To evaluate the performance of the TEO method, we compare it with five other methods degree, betweenness, NC, Pec, and CowEWC in detecting essential proteins from two different yeast PPI datasets. The simulation results show that adding GO information can effectively improve the predicted precision and that our method outperforms the others in predicting essential proteins.

[1]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[2]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[3]  A. Clatworthy,et al.  Targeting virulence: a new paradigm for antimicrobial therapy , 2007, Nature Chemical Biology.

[4]  Dao-Qing Dai,et al.  Detecting overlapping protein complexes based on a generative model with functional and topological properties , 2014, BMC Bioinformatics.

[5]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Jiawei Luo,et al.  Identification of Essential Proteins Based on a New Combination of Local Interaction Density and Protein Complexes , 2015, PloS one.

[7]  Yi Pan,et al.  A New Method for Identifying Essential Proteins Based on Edge Clustering Coefficient , 2011, ISBRA.

[8]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[9]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[10]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[11]  Yi Pan,et al.  Essential Protein Discovery Based on Network Motif and Gene Ontology , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[12]  Yi Pan,et al.  Prediction of Essential Proteins Based on Overlapping Essential Modules , 2014, IEEE Transactions on NanoBioscience.

[13]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[14]  Fang-Xiang Wu,et al.  Domain control of nonlinear networked systems and applications to complex disease networks , 2017 .

[15]  Sanjay Kumar,et al.  Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi , 2009, BMC Microbiology.

[16]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Mitsuhiro Itaya,et al.  An estimation of minimal genome size required for life , 1995, FEBS letters.

[18]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[19]  Xiufen Zou,et al.  Negative feedback contributes to the stochastic expression of the interferon-β gene in virus-triggered type I interferon signaling pathways. , 2015, Mathematical biosciences.

[20]  Xiufen Zou,et al.  A New Method for Detecting Protein Complexes based on the Three Node Cliques , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[22]  Xiufen Zou,et al.  Characterizing and controlling the inflammatory network during influenza A virus infection , 2014, Scientific Reports.

[23]  Y. Dong,et al.  Systematic functional analysis of the Caenorhabditis elegans genome using RNAi , 2003, Nature.

[24]  Hongfei Lin,et al.  Construction of Ontology Augmented Networks for Protein Complex Prediction , 2013, PloS one.

[25]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[26]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[27]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[28]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[29]  Xinghuo Yu,et al.  Identification and Evolution of Structurally Dominant Nodes in Protein-Protein Interaction Networks , 2014, IEEE Transactions on Biomedical Circuits and Systems.

[30]  Xiufen Zou,et al.  Deciphering deterioration mechanisms of complex diseases based on the construction of dynamic networks and systems analysis , 2015, Scientific Reports.

[31]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[32]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Yi Pan,et al.  Identifying essential proteins from active PPI networks constructed with dynamic gene expression , 2015, BMC Genomics.

[34]  R. Kaul,et al.  A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate , 2007, Proceedings of the National Academy of Sciences.

[35]  Enrico Blanzieri,et al.  Identification of Essential Proteins Based on Ranking Edge-Weights in Protein-Protein Interaction Networks , 2014, PloS one.

[36]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[37]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[38]  Thomas Lengauer,et al.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms , 2010, Bioinform..

[39]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[40]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[41]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[42]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[43]  Jin Xu,et al.  A New Method for the Discovery of Essential Proteins , 2013, PloS one.

[44]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[45]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.