Advances and perspectives in computational prediction of microbial gene essentiality.

The minimal subset of genes required for cellular growth, survival and viability of an organism are classified as essential genes. Knowledge of essential genes gives insight into the core structure and functioning of a cell. This might lead to more efficient antimicrobial drug discovery, to elucidation of the correlations between genotype and phenotype, and a better understanding of the minimal requirements for a (synthetic) cell. Traditionally, constructing a catalog of essential genes for a given microbe involved costly and time-consuming laboratory experiments. While experimental methods have produced abundant gene essentiality data for model organisms like Escherichia coli and Bacillus subtilis, the knowledge generated cannot automatically be extrapolated to predict essential genes in all bacteria. In addition, essential genes identified in the laboratory are by definition 'conditionally essential', as they are essential under the specified experimental conditions: these might not resemble conditions in the microorganisms' natural habitat(s). Also, large-scale experimental assaying for essential genes is not always feasible because of the time investment required to setup these assays. The ability to rapidly and precisely identify essential genes in silico is therefore important and has great potential for applications in medicine, biotechnology and basic biological research. Here, we review the advances made in the use of computational methods to predict microbial gene essentiality, perspectives for the future of these techniques and the possible practical applications of essential genes.

[1]  L. Herman,et al.  Bacillus sporothermodurans and other highly heat‐resistant spore formers in milk , 2006, Journal of applied microbiology.

[2]  Ross S Hall,et al.  Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes , 2010, BMC Genomics.

[3]  Isabelle Queinnec,et al.  Transcriptome and Proteome Exploration to Model Translation Efficiency and Protein Stability in Lactococcus lactis , 2009, PLoS Comput. Biol..

[4]  Ali A. Minai,et al.  Investigating the predictability of essential genes across distantly related organisms using an integrative approach , 2010, Nucleic acids research.

[5]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[6]  Adam M. Feist,et al.  Reconstruction of biochemical networks in microorganisms , 2009, Nature Reviews Microbiology.

[7]  Stanley Falkow,et al.  Global Transposon Mutagenesis and Essential Gene Analysis of Helicobacter pylori , 2004, Journal of bacteriology.

[8]  Stephen C. J. Parker,et al.  Towards the identification of essential genes using targeted genome sequencing and comparative analysis , 2006, BMC Genomics.

[9]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[10]  Edward J. O'Brien,et al.  Using Genome-scale Models to Predict Biological Capabilities , 2015, Cell.

[11]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[12]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[13]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[14]  Joshua A. Lerman,et al.  COBRApy: COnstraints-Based Reconstruction and Analysis for Python , 2013, BMC Systems Biology.

[15]  C. Hutchison,et al.  Essential genes of a minimal bacterium. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Loubière,et al.  Assessment of the Diversity of Dairy Lactococcus lactis subsp. lactis Isolates by an Integrated Approach Combining Phenotypic, Genomic, and Transcriptomic Analyses , 2010, Applied and Environmental Microbiology.

[17]  Huiru Zheng,et al.  From Experimental Approaches to Computational Techniques: A Review on the Prediction of Protein-Protein Interactions , 2010, Adv. Artif. Intell..

[18]  Ronan M. T. Fleming,et al.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 , 2007, Nature Protocols.

[19]  A. Camilli,et al.  Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms , 2013, Nature Reviews Microbiology.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  O. White,et al.  Global transposon mutagenesis and a minimal Mycoplasma genome. , 1999, Science.

[22]  Rob Knight,et al.  Identifying genetic determinants needed to establish a human gut symbiont in its habitat. , 2009, Cell host & microbe.

[23]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[24]  Lars Barquist,et al.  Approaches to querying bacterial genomes with transposon-insertion sequencing , 2013, RNA biology.

[25]  J. Shea,et al.  Simultaneous identification of bacterial virulence genes by negative selection. , 1995, Science.

[26]  Debkumar Chakraborty,et al.  Biotechnological and Molecular Approaches for Vanillin Production: a Review , 2013, Applied Biochemistry and Biotechnology.

[27]  J. Woodcock,et al.  Translation of pharmacogenomics and pharmacogenetics: a regulatory perspective , 2004, Nature Reviews Drug Discovery.

[28]  Jianzhi Zhang,et al.  Why Do Hubs Tend to Be Essential in Protein Networks? , 2006, PLoS genetics.

[29]  J. M. Jay Fermented Foods and Related Products of Fermentation , 1992 .

[30]  J. Craig Venter,et al.  Genome Transplantation in Bacteria: Changing One Species to Another , 2007, Science.

[31]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[32]  D. Pieper,et al.  Engineering bacteria for bioremediation. , 2000, Current opinion in biotechnology.

[33]  J. Mueller,et al.  Oil spill bioremediation: experiences, lessons and results from the Exxon Valdez oil spill in Alaska , 1992, Biodegradation.

[34]  Georgia Giannoukos,et al.  Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung , 2009, Proceedings of the National Academy of Sciences.

[35]  M. Frank-Kamenetskii,et al.  Base-stacking and base-pairing contributions into thermal stability of the DNA double helix , 2006, Nucleic acids research.

[36]  S. Carroll,et al.  The regulatory content of intergenic DNA shapes genome architecture , 2004, Genome Biology.

[37]  Gregory A. Buck,et al.  Genome-wide essential gene identification in Streptococcus sanguinis , 2011, Scientific reports.

[38]  F. Doyle,et al.  Dynamic flux balance analysis of diauxic growth in Escherichia coli. , 2002, Biophysical journal.

[39]  E. V. van Munster,et al.  Imaging in situ protein-DNA interactions in the cell nucleus using FRET-FLIM. , 2005, Experimental cell research.

[40]  Eduardo P C Rocha,et al.  Essentiality, not expressiveness, drives gene-strand bias in bacteria , 2003, Nature Genetics.

[41]  E. Ruppin,et al.  Regulatory on/off minimization of metabolic flux changes after genetic perturbations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  H. Mori,et al.  Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection , 2006, Molecular systems biology.

[43]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[44]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[45]  A. Mushegian,et al.  The minimal genome concept. , 1999, Current opinion in genetics & development.

[46]  Dongsup Kim,et al.  Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe , 2010, Nature Biotechnology.

[47]  K. Matuschewski,et al.  Genetic crosses and complementation reveal essential functions for the Plasmodium stage‐specific actin2 in sporogonic development , 2014, Cellular microbiology.

[48]  B. Palsson,et al.  Towards genome-scale signalling-network reconstructions , 2010, Nature Reviews Genetics.

[49]  Joshua A. Lerman,et al.  Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments , 2013, Proceedings of the National Academy of Sciences.

[50]  Thomas Dick,et al.  In silico analyses for the discovery of tuberculosis drug targets. , 2013, The Journal of antimicrobial chemotherapy.

[51]  R. D. Tripathi,et al.  Environmental bioremediation technologies , 2007 .

[52]  Eduardo Abeliuk,et al.  The essential genome of a bacterium , 2011, Molecular systems biology.

[53]  Michael R. Seringhaus,et al.  Predicting essential genes in fungal genomes. , 2006, Genome research.

[54]  T. Kigawa,et al.  A Fluorescent-Based High-Throughput Screening Assay for Small Molecules That Inhibit the Interaction of MdmX with p53 , 2013, Journal of biomolecular screening.

[55]  H. Leonhardt,et al.  Visualization and targeted disruption of protein interactions in living cells , 2013, Nature Communications.

[56]  A. Camilli,et al.  Tn-seq; high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms , 2009, Nature Methods.

[57]  M. Nout,et al.  Microbiota of cocoa powder with particular reference to aerobic thermoresistant spore-formers. , 2011, Food microbiology.

[58]  B. Palsson,et al.  Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods , 2012, Nature Reviews Microbiology.

[59]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.

[60]  Aarash Bordbar,et al.  Functional characterization of alternate optimal solutions of Escherichia coli's transcriptional and translational machinery. , 2010, Biophysical journal.

[61]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[62]  E. Koonin,et al.  Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. , 2002, Genome research.

[63]  G. Arndt,et al.  Genome‐wide screening for gene function using RNAi in mammalian cells , 2005, Immunology and cell biology.

[64]  Leo Eberl,et al.  Essence of life: essential genes of minimal genomes. , 2011, Trends in cell biology.

[65]  James R. Brown,et al.  A Global Approach to Identify Novel Broad-Spectrum Antibacterial Targets among Proteins of Unknown Function , 2004, Journal of Molecular Microbiology and Biotechnology.

[66]  P. Alberch From genes to phenotype: dynamical systems and evolvability , 2004, Genetica.

[67]  A. Moya,et al.  Determination of the Core of a Minimal Bacterial Gene Set , 2004, Microbiology and Molecular Biology Reviews.

[68]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[69]  Sang Yup Lee,et al.  Metabolite essentiality elucidates robustness of Escherichia coli metabolism , 2007, Proceedings of the National Academy of Sciences.

[70]  Corey Nislow,et al.  Recent advances and method development for drug target identification. , 2010, Trends in pharmacological sciences.

[71]  Ali R. Zomorrodi,et al.  Mathematical optimization applications in metabolic networks. , 2012, Metabolic engineering.

[72]  C. Francke,et al.  Reconstructing the metabolic network of a bacterium from its genome. , 2005, Trends in microbiology.

[73]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[74]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[75]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[76]  Aldert L. Zomer,et al.  From microbial gene essentiality to novel antimicrobial drug targets , 2014, BMC Genomics.

[77]  T J Dougherty,et al.  Concordance analysis of microbial genomes. , 1998, Nucleic acids research.

[78]  E. Koonin,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[79]  S. Ehrlich,et al.  Essential Bacillus subtilis genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Jae-Hoon Song,et al.  Identification of essential genes in Streptococcus pneumoniae by allelic replacement mutagenesis. , 2005, Molecules and cells.

[81]  Eric D Brown,et al.  Are essential genes really essential? , 2009, Trends in microbiology.

[82]  Bernhard O. Palsson,et al.  BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions , 2010, BMC Bioinformatics.

[83]  Li Zhao,et al.  Training Set Selection for the Prediction of Essential Genes , 2014, PloS one.

[84]  Robin D Dowell,et al.  Genotype to Phenotype: A Complex Problem , 2010, Science.

[85]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[86]  J. Mekalanos,et al.  A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Z. Rehman,et al.  Microbial alginate production, modification and its applications , 2013, Microbial biotechnology.

[88]  Leopold Parts,et al.  Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. , 2009, Genome research.

[89]  Feng Gao,et al.  Protein Localization Analysis of Essential Genes in Prokaryotes , 2014, Scientific Reports.

[90]  G. Church,et al.  Analysis of optimality in natural and perturbed metabolic networks , 2002 .

[91]  Dong Xu,et al.  Understanding protein dispensability through machine-learning analysis of high-throughput data , 2005, Bioinform..

[92]  Kathryn E. Hentges,et al.  Defining the Role of Essential Genes in Human Disease , 2011, PloS one.

[93]  Roland Eils,et al.  Identifying essential genes in bacterial metabolic networks with machine learning methods , 2010, BMC Systems Biology.

[94]  Peter Uetz,et al.  Protein Domains of Unknown Function Are Essential in Bacteria , 2013, mBio.

[95]  Matthew W. Hahn,et al.  Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. , 2005, Molecular biology and evolution.

[96]  Antoine Danchin,et al.  How essential are nonessential genes? , 2005, Molecular biology and evolution.