Model-based identification of conditionally-essential genes from transposon-insertion sequencing data

The understanding of bacterial gene function has been greatly enhanced by recent advancements in the deep sequencing of microbial genomes. Transposon insertion sequencing methods combines next-generation sequencing techniques with transposon mutagenesis for the exploration of the essentiality of genes under different environmental conditions. We propose a model-based method that uses regularized negative binomial regression to estimate the change in transposon insertions attributable to gene-environment changes without transformations or uniform normalization. An empirical Bayes model for estimating the local false discovery rate combines unique and total count information to test for genes that show a statistically significant change in transposon counts. When applied to RB-TnSeq (randomized barcode transposon sequencing) and Tn-seq (transposon sequencing) libraries made in strains of Caulobacter crescentus using both total and unique count data the model was able to identify a set of conditionally essential genes for each target condition that shed light on their functions and roles during various stress conditions. Author summary Transposon insertion sequencing allows the study of bacterial gene function by combining next-generation sequencing techniques with transposon mutagenesis under different genetic and environmental perturbations. Our proposed regularized negative binomial regression method improves the quality of analysis of this data.

[1]  Amy K. Cain,et al.  A decade of advances in transposon-insertion sequencing , 2020, Nature Reviews Genetics.

[2]  B. Tu,et al.  The Lon protease links nucleotide metabolism with proteotoxic stress , 2019, bioRxiv.

[3]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[4]  R. Baker,et al.  Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression , 2019, BMC Bioinformatics.

[5]  Adam P. Arkin,et al.  Mutant phenotypes for thousands of bacterial genes of unknown function , 2018, Nature.

[6]  R. Baker,et al.  Statistical analysis of genetic interactions in Tn-Seq data , 2017, Nucleic acids research.

[7]  E. Furlong,et al.  Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers , 2016, BMC Bioinformatics.

[8]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[9]  Kelly M. Wetmore,et al.  Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons , 2015, mBio.

[10]  Shuangge Ma,et al.  EM for regularized zero‐inflated regression models with applications to postoperative morbidity after cardiac surgery in children , 2014, Statistics in medicine.

[11]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[12]  A. Camilli,et al.  Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms , 2013, Nature Reviews Microbiology.

[13]  Thomas R. Ioerger,et al.  Bayesian analysis of gene essentiality based on sequencing of transposon insertion libraries , 2013, Bioinform..

[14]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[15]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[16]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[17]  Aldert L. Zomer,et al.  ESSENTIALS: Software for Rapid Analysis of High Throughput Transposon Insertion Sequencing Data , 2012, PloS one.

[18]  R. Gourse,et al.  Direct regulation of Escherichia coli ribosomal protein promoters by the transcription factors ppGpp and DksA , 2011, Proceedings of the National Academy of Sciences.

[19]  Stephen R. Quake,et al.  Sensitivity of Noninvasive Prenatal Detection of Fetal Aneuploidy from Maternal Plasma Using Shotgun Sequencing Is Limited Only by Counting Statistics , 2010, PloS one.

[20]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[21]  A. Camilli,et al.  Tn-seq; high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms , 2009, Nature Methods.

[22]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[23]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[24]  B. Efron Rejoinder: Microarrays, Empirical Bayes and the Two-Groups Model , 2008, 0808.0603.

[25]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[26]  D. Donoho,et al.  Asymptotic Minimaxity Of False Discovery Rate Thresholding For Sparse Exponential Data , 2006, math/0602311.

[27]  Michael T Laub,et al.  Two-Component Signal Transduction Pathways Regulating Growth and Cell Cycle Progression in a Bacterium: A System-Level Analysis , 2005, PLoS biology.

[28]  Michael I. Jordan,et al.  A latent variable model for chemogenomic profiling , 2005, Bioinform..

[29]  D. J. Naylor,et al.  Proteome-wide Analysis of Chaperonin-Dependent Protein Folding in Escherichia coli , 2005, Cell.

[30]  Michael I. Jordan,et al.  Chemogenomic profiling: identifying the functional interactions of small molecules in yeast. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  John D. Storey A direct approach to false discovery rates , 2002 .

[32]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[33]  D. Haas,et al.  The CbrA–CbrB two‐component regulatory system controls the utilization of multiple carbon and nitrogen sources in Pseudomonas aeruginosa , 2001, Molecular microbiology.

[34]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[36]  H. Steinman,et al.  Catalase-peroxidase of Caulobacter crescentus: function and role in stationary-phase survival , 1997, Journal of bacteriology.

[37]  S. West,et al.  Biological roles of the Escherichia coli RuvA, RuvB and RuvC proteins revealed , 1992, Molecular microbiology.

[38]  A. Kuspa,et al.  Tagging developmental genes in Dictyostelium by restriction enzyme-mediated integration of plasmid DNA. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[40]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.