A Statistical Framework for Improving Genomic Annotations of Prokaryotic Essential Genes

Large-scale systematic analysis of gene essentiality is an important step closer toward unraveling the complex relationship between genotypes and phenotypes. Such analysis cannot be accomplished without unbiased and accurate annotations of essential genes. In current genomic databases, most of the essential gene annotations are derived from whole-genome transposon mutagenesis (TM), the most frequently used experimental approach for determining essential genes in microorganisms under defined conditions. However, there are substantial systematic biases associated with TM experiments. In this study, we developed a novel Poisson model–based statistical framework to simulate the TM insertion process and subsequently correct the experimental biases. We first quantitatively assessed the effects of major factors that potentially influence the accuracy of TM and subsequently incorporated relevant factors into the framework. Through iteratively optimizing parameters, we inferred the actual insertion events occurred and described each gene’s essentiality on probability measure. Evaluated by the definite mapping of essential gene profile in Escherichia coli, our model significantly improved the accuracy of original TM datasets, resulting in more accurate annotations of essential genes. Our method also showed encouraging results in improving subsaturation level TM datasets. To test our model’s broad applicability to other bacteria, we applied it to Pseudomonas aeruginosa PAO1 and Francisella tularensis novicida TM datasets. We validated our predictions by literature as well as allelic exchange experiments in PAO1. Our model was correct on six of the seven tested genes. Remarkably, among all three cases that our predictions contradicted the TM assignments, experimental validations supported our predictions. In summary, our method will be a promising tool in improving genomic annotations of essential genes and enabling large-scale explorations of gene essentiality. Our contribution is timely considering the rapidly increasing essential gene sets. A Webserver has been set up to provide convenient access to this tool. All results and source codes are available for download upon publication at http://research.cchmc.org/essentialgene/.

[1]  R. Kaul,et al.  A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate , 2007, Proceedings of the National Academy of Sciences.

[2]  M. Stahl,et al.  Identification of essential genes in C. jejuni genome highlights hyper-variable plasticity regions , 2011, Functional & Integrative Genomics.

[3]  O. White,et al.  Global transposon mutagenesis and a minimal Mycoplasma genome. , 1999, Science.

[4]  A. Ko,et al.  Genome-Wide Transposon Mutagenesis in Pathogenic Leptospira Species , 2008, Infection and Immunity.

[5]  Manuel Peitsch,et al.  A genome-based approach for the identification of essential bacterial genes , 1998, Nature Biotechnology.

[6]  V. B. Melas,et al.  Advances in Stochastic Simulation Methods , 2012 .

[7]  Robin D Dowell,et al.  Genotype to Phenotype: A Complex Problem , 2010, Science.

[8]  J. Mekalanos,et al.  A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Mark D'Souza,et al.  From Genetic Footprinting to Antimicrobial Drug Targets: Examples in Cofactor Biosynthetic Pathways , 2002, Journal of bacteriology.

[10]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[11]  Bernhard Ø Palsson,et al.  Predicting gene essentiality using genome-scale in silico models. , 2008, Methods in molecular biology.

[12]  Ali A. Minai,et al.  Investigating the predictability of essential genes across distantly related organisms using an integrative approach , 2010, Nucleic acids research.

[13]  Stephen C. J. Parker,et al.  Towards the identification of essential genes using targeted genome sequencing and comparative analysis , 2006, BMC Genomics.

[14]  J. W. Campbell,et al.  Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655 , 2003, Journal of bacteriology.

[15]  Leopold Parts,et al.  Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. , 2009, Genome research.

[16]  Dong Xu,et al.  Understanding protein dispensability through machine-learning analysis of high-throughput data , 2005, Bioinform..

[17]  Vincent Schächter,et al.  A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1 , 2008, Molecular systems biology.

[18]  M. Yamada,et al.  Global Analysis of the Genes Involved in the Thermotolerance Mechanism of Thermotolerant Acetobacter tropicalis SKU1100 , 2011, Bioscience, biotechnology, and biochemistry.

[19]  E. Sonnleitner,et al.  Reduced virulence of a hfq mutant of Pseudomonas aeruginosa O1. , 2003, Microbial pathogenesis.

[20]  Frederick M Ausubel,et al.  Correction for Liberati et al., An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants , 2006, Proceedings of the National Academy of Sciences.

[21]  Jeanette E. Bröms,et al.  Polarisation of type III translocation by Pseudomonas aeruginosa requires PcrG, PcrV and PopN. , 2004, Microbial pathogenesis.

[22]  Sean Ekins,et al.  Essential Metabolites of Mycobacterium tuberculosis and Their Mimics , 2011, mBio.

[23]  Peer Bork,et al.  OGEE: an online gene essentiality database , 2011, Nucleic Acids Res..

[24]  J. Mekalanos,et al.  Transposon-based approaches to identify essential bacterial genes. , 2000, Trends in microbiology.

[25]  Jennifer L. Reed,et al.  iRsp1095: A genome-scale reconstruction of the Rhodobacter sphaeroides metabolic network , 2011, BMC Systems Biology.

[26]  T J Dougherty,et al.  Concordance analysis of microbial genomes. , 1998, Nucleic acids research.

[27]  James F. Zolman,et al.  Biostatistics: Experimental Design and Statistical Inference. , 1995 .

[28]  Gregory A. Buck,et al.  Genome-wide essential gene identification in Streptococcus sanguinis , 2011, Scientific reports.

[29]  Raymond Lo,et al.  Pseudomonas Genome Database: improved comparative analysis and population genomics capability for Pseudomonas genomes , 2010, Nucleic Acids Res..

[30]  Areejit Samal,et al.  Targeting multiple targets in Pseudomonas aeruginosa PAO1 using flux balance analysis of a reconstructed genome-scale metabolic network , 2011, Journal of drug targeting.

[31]  J. Hamer,et al.  Recent advances in large-scale transposon mutagenesis. , 2001, Current opinion in chemical biology.

[32]  A. Mushegian,et al.  The minimal genome concept. , 1999, Current opinion in genetics & development.

[33]  P. Beare,et al.  Siderophore‐mediated cell signalling in Pseudomonas aeruginosa: divergent pathways regulate virulence factor production and siderophore receptor synthesis , 2002, Molecular microbiology.

[34]  Karl W. Broman,et al.  A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: Application to Mycobacterium tuberculosis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Rick Stevens,et al.  Essential genes on metabolic maps. , 2006, Current opinion in biotechnology.

[36]  Steffen Heber,et al.  In silico prediction of yeast deletion phenotypes. , 2006, Genetics and molecular research : GMR.

[37]  Ahmad A Mannan,et al.  Interrogation of global mutagenesis data with a genome scale model of Neisseria meningitidis to assess gene fitness in vitro and in sera , 2011, Genome Biology.

[38]  K. Broman,et al.  Estimating the number of essential genes in a genome by random transposon mutagenesis , 2002 .

[39]  Schweizer Hd Small broad-host-range gentamycin resistance gene cassettes for site-specific insertion and deletion mutagenesis. , 1993 .

[40]  Eduardo Abeliuk,et al.  The essential genome of a bacterium , 2011, Molecular systems biology.

[41]  J. Ramos,et al.  Identification of conditionally essential genes for growth of Pseudomonas putida KT2440 on minimal medium through the screening of a genome-wide mutant library. , 2010, Environmental microbiology.

[42]  Michael R. Seringhaus,et al.  Predicting essential genes in fungal genomes. , 2006, Genome research.

[43]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[44]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[45]  Eric Haugen,et al.  Comprehensive transposon mutant library of Pseudomonas aeruginosa , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  L. Lu,et al.  Exploring the Optimal Strategy to Predict Essential Genes in Microbes , 2011, Biomolecules.

[47]  C. Hutchison,et al.  Essential genes of a minimal bacterium. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Kellen L. Olszewski,et al.  Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network , 2010, Molecular systems biology.

[49]  J. Kato,et al.  Construction of consecutive deletions of the Escherichia coli chromosome , 2007, Molecular systems biology.

[50]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[51]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[52]  I. Good Some Statistical Applications of Poisson's Work , 1986 .

[53]  Roy R Chaudhuri,et al.  Comprehensive identification of essential Staphylococcus aureus genes using Transposon-Mediated Differential Hybridisation (TMDH) , 2009, BMC Genomics.