Geptop: A Gene Essentiality Prediction Tool for Sequenced Bacterial Genomes Based on Orthology and Phylogeny

Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.

[1]  T. Tuschl,et al.  Analysis of gene function in somatic mammalian cells using small interfering RNAs. , 2002, Methods.

[2]  Christian von Mering,et al.  High Confidence Prediction of Essential Genes in Burkholderia Cenocepacia , 2012, PloS one.

[3]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[4]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[5]  Nitesh Kumar Singh,et al.  T-iDT: Tool for Identification of Drug Target in Bacteria and Validation by Mycobacterium Tuberculosis , 2006, Silico Biol..

[6]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[7]  Eduardo P C Rocha,et al.  An analysis of determinants of amino acids substitution rates in bacterial proteins. , 2004, Molecular biology and evolution.

[8]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[9]  Claus O. Wilke,et al.  Mistranslation-Induced Protein Misfolding as a Dominant Constraint on Coding-Sequence Evolution , 2008, Cell.

[10]  Bernhard O. Palsson,et al.  Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions , 2000, BMC Bioinformatics.

[11]  Eduardo P C Rocha,et al.  Essentiality, not expressiveness, drives gene-strand bias in bacteria , 2003, Nature Genetics.

[12]  Feng Gao,et al.  DoriC: a database of oriC regions in bacterial genomes , 2007, Bioinform..

[13]  R. Kaul,et al.  A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate , 2007, Proceedings of the National Academy of Sciences.

[14]  Carlos G. Acevedo-Rocha,et al.  From essential to persistent genes: a functional approach to constructing synthetic life , 2013, Trends in genetics : TIG.

[15]  J. Kato,et al.  Construction of consecutive deletions of the Escherichia coli chromosome , 2007, Molecular systems biology.

[16]  Zhao Xu,et al.  CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes , 2009, Nucleic Acids Res..

[17]  Michael R. Seringhaus,et al.  Predicting essential genes in fungal genomes. , 2006, Genome research.

[18]  U. Sauer,et al.  Metabolic functions of duplicate genes in Saccharomyces cerevisiae. , 2005, Genome research.

[19]  Hsuan-Cheng Huang,et al.  Predicting essential genes based on network and sequence analysis. , 2009, Molecular bioSystems.

[20]  Anirban Dutta,et al.  In Silico Identification of Potential Therapeutic Targets in the Human Pathogen Helicobacter Pylori , 2006, Silico Biol..

[21]  Frederick M Ausubel,et al.  Correction for Liberati et al., An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants , 2006, Proceedings of the National Academy of Sciences.

[22]  Leo Eberl,et al.  Essence of life: essential genes of minimal genomes. , 2011, Trends in cell biology.

[23]  Eugene V Koonin,et al.  The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages , 2009, Proceedings of the National Academy of Sciences.

[24]  J. Mekalanos,et al.  A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Jianzhi Zhang,et al.  No gene-specific optimization of mutation rate in Escherichia coli. , 2013, Molecular biology and evolution.

[26]  Martin Rosenberg,et al.  Identification of Critical Staphylococcal Genes Using Conditional Phenotypes Generated by Antisense RNA , 2001, Science.

[27]  Anil Kumar,et al.  In silico Identification of Candidate Drug and Vaccine Targets from Various Pathways in Neisseria gonorrhoeae , 2009, Silico Biol..

[28]  A. Moya,et al.  Determination of the Core of a Minimal Bacterial Gene Set , 2004, Microbiology and Molecular Biology Reviews.

[29]  H. Mori,et al.  Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection , 2006, Molecular systems biology.

[30]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[31]  Eugene V. Koonin,et al.  Comparative genomics, minimal gene-sets and the last universal common ancestor , 2003, Nature Reviews Microbiology.

[32]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[33]  Ali A. Minai,et al.  Investigating the predictability of essential genes across distantly related organisms using an integrative approach , 2010, Nucleic acids research.

[34]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[35]  Howard Xu,et al.  A genome‐wide strategy for the identification of essential genes in Staphylococcus aureus , 2002, Molecular microbiology.

[36]  Leo Eberl,et al.  Essential genes as antimicrobial targets and cornerstones of synthetic biology. , 2012, Trends in biotechnology.

[37]  MingKun Li,et al.  Comparative analysis of essential genes and nonessential genes in Escherichia coli K12 , 2007, Molecular Genetics and Genomics.

[38]  Roland Eils,et al.  Identifying essential genes in bacterial metabolic networks with machine learning methods , 2010, BMC Systems Biology.

[39]  Eduardo P C Rocha,et al.  Gene essentiality determines chromosome organisation in bacteria. , 2003, Nucleic acids research.

[40]  Dong Xu,et al.  Understanding protein dispensability through machine-learning analysis of high-throughput data , 2005, Bioinform..

[41]  Michael L Shuler,et al.  Modeling a minimal cell. , 2012, Methods in molecular biology.

[42]  I. Cooper,et al.  Predicting conserved essential genes in bacteria: in silico identification of putative drug targets. , 2010, Molecular bioSystems.

[43]  E. Koonin,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Sanjay Kumar,et al.  Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi , 2009, BMC Microbiology.

[45]  J. W. Campbell,et al.  Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655 , 2003, Journal of bacteriology.

[46]  Vincent Schächter,et al.  A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1 , 2008, Molecular systems biology.

[47]  M. Ackermann,et al.  Patterns of Evolutionary Conservation of Essential Genes Correlate with Their Compensability , 2012, PLoS genetics.

[48]  Yan Lin,et al.  Putative essential and core-essential genes in Mycoplasma genomes , 2011, Scientific reports.

[49]  Preeti Gupta,et al.  In silico Identification of Putative Drug Targets from Different Metabolic Pathways of Aeromonas hydrophila , 2008, Silico Biol..

[50]  Andrew Camilli,et al.  Identification of essential genes of the periodontal pathogen Porphyromonas gingivalis , 2012, BMC Genomics.

[51]  Dirk Koschützki,et al.  How to identify essential genes from molecular networks? , 2009, BMC Systems Biology.

[52]  Meena Kishore Sakharkar,et al.  A novel genomics approach for the identification of drug targets in pathogens, with special reference to Pseudomonas aeruginosa , 2004, Silico Biol..

[53]  Peer Bork,et al.  Younger Genes Are Less Likely to Be Essential than Older Genes, and Duplicates Are Less Likely to Be Essential than Singletons of the Same Age , 2012, Molecular biology and evolution.

[54]  Steffen Heber,et al.  In silico prediction of yeast deletion phenotypes. , 2006, Genetics and molecular research : GMR.

[55]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[56]  Areejit Samal,et al.  Targeting multiple targets in Pseudomonas aeruginosa PAO1 using flux balance analysis of a reconstructed genome-scale metabolic network , 2011, Journal of drug targeting.