Gene Essentiality Prediction Using Topological Features From Metabolic Networks

Fundamental questions such as what are the genes that are really necessary for the survival of cells have motivated many studies to investigate the essentiality of genes in different species. Initial efforts have attempted to address this problem through exhaustive knockout experiments in simple bacteria. Recently, results obtained in these studies have also been applied to the emerging field of synthetic biology with possible implications in many other fields such as health and energy. Motivated by the evolution of DNA sequencing technology and high-throughput biological data generation, many recent efforts have also been made for building and understanding biological networks. In particular, metabolic networks represent the set of known biochemical reactions within a cell. Essential genes are expected to play a key role in these networks, as they must be involved in vital metabolic pathways. Even though some studies investigated the correlation between essential genes and biological network information, different types of networks and other biological information were usually combined and the effect of each of them in the obtained results was not stressed. This paper describes an attempt to predict essential genes using solely topological features from metabolic networks. The networks were built from a common repository, the KEGG database, ensuring data uniformity. Experimentally, considering different prediction scenarios and reference organisms, the use of topological features from metabolic networks achieved mean AUC of about 70% in the prediction of gene essentiality. This reveals that more factors affect essentiality and should indeed be considered in order to obtain more accurate predictions.

[1]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[2]  Enys Mones,et al.  Hierarchy Measure for Complex Networks , 2012, PloS one.

[3]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[4]  Yu-Dong Cai,et al.  Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways , 2017, PloS one.

[5]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[6]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[7]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[8]  Matthieu Latapy,et al.  Basic notions for the analysis of large two-mode networks , 2008, Soc. Networks.

[9]  Ney Lemke,et al.  Essentiality and damage in metabolic networks , 2004, Bioinform..

[10]  Mercè Llabrés,et al.  Metabolomics analysis: Finding out metabolic building blocks , 2017, PloS one.

[11]  Roland Eils,et al.  Identifying essential genes in bacterial metabolic networks with machine learning methods , 2010, BMC Systems Biology.

[12]  Reinhard Schneider,et al.  Using graph theory to analyze biological networks , 2011, BioData Mining.

[13]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Leo Eberl,et al.  Essential genes as antimicrobial targets and cornerstones of synthetic biology. , 2012, Trends in biotechnology.

[15]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Ren Zhang,et al.  DEG: a database of essential genes. , 2004, Nucleic acids research.

[18]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[19]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[20]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[21]  Dirk Koschützki,et al.  How to identify essential genes from molecular networks? , 2009, BMC Systems Biology.

[22]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[23]  Ney Lemke,et al.  Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review , 2016, Front. Physiol..

[24]  Yukiko Yamazaki,et al.  Profiling of Escherichia coli Chromosome database. , 2008, Methods in molecular biology.

[25]  Norman Pavelka,et al.  Emerging and evolving concepts in gene essentiality , 2017, Nature Reviews Genetics.

[26]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[27]  Leo Eberl,et al.  Essence of life: essential genes of minimal genomes. , 2011, Trends in cell biology.

[28]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[29]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[30]  Markus J. Herrgård,et al.  Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. , 2004, Genome research.