ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data

BackgroundInsertion sequences (IS) are small transposable elements, commonly found in bacterial genomes. Identifying the location of IS in bacterial genomes can be useful for a variety of purposes including epidemiological tracking and predicting antibiotic resistance. However IS are commonly present in multiple copies in a single genome, which complicates genome assembly and the identification of IS insertion sites. Here we present ISMapper, a mapping-based tool for identification of the site and orientation of IS insertions in bacterial genomes, directly from paired-end short read data.ResultsISMapper was validated using three types of short read data: (i) simulated reads from a variety of species, (ii) Illumina reads from 5 isolates for which finished genome sequences were available for comparison, and (iii) Illumina reads from 7 Acinetobacter baumannii isolates for which predicted IS locations were tested using PCR. A total of 20 genomes, including 13 species and 32 distinct IS, were used for validation. ISMapper correctly identified 97 % of known IS insertions in the analysis of simulated reads, and 98 % in real Illumina reads. Subsampling of real Illumina reads to lower depths indicated ISMapper was able to correctly detect insertions for average genome-wide read depths >20x, although read depths >50x were required to obtain confident calls that were highly-supported by evidence from reads. All ISAba1 insertions identified by ISMapper in the A. baumannii genomes were confirmed by PCR. In each A. baumannii genome, ISMapper successfully identified an IS insertion upstream of the ampC beta-lactamase that could explain phenotypic resistance to third-generation cephalosporins. The utility of ISMapper was further demonstrated by profiling genome-wide IS6110 insertions in 138 publicly available Mycobacterium tuberculosis genomes, revealing lineage-specific insertions and multiple insertion hotspots.ConclusionsISMapper provides a rapid and robust method for identifying IS insertion sites directly from short read data, with a high degree of accuracy demonstrated across a wide range of bacteria.

[1]  J. Barbé,et al.  Molecular fingerprinting of Salmonella typhimurium by IS200-typing as a tool for epidemiological and evolutionary studies. , 1994, Microbiologia.

[2]  M. Chandler,et al.  Insertion Sequences , 1998, Microbiology and Molecular Biology Reviews.

[3]  Rayan Chikhi,et al.  MindTheGap: integrated detection and assembly of short and long insertions , 2014, Bioinform..

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  Ira M. Hall,et al.  SAMBLASTER: fast duplicate marking and structural variant read extraction , 2014, Bioinform..

[6]  J. Wain,et al.  High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[7]  F. Mooi,et al.  DNA fingerprinting of Vibrio cholerae strains with a novel insertion sequence element: a tool to identify epidemic strains , 1996, Journal of clinical microbiology.

[8]  P. Siguier,et al.  Bacterial insertion sequences: their genomic impact and diversity , 2014, FEMS microbiology reviews.

[9]  J. D. Ploeg,et al.  Adaptation of Xanthobacter autotrophicus GJ10 to bromoacetate due to activation and mobilization of the haloacetate dehalogenase gene by insertion element IS1247 , 1995, Journal of bacteriology.

[10]  David B. Knoester,et al.  Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq , 2014, BMC Genomics.

[11]  Justin Zobel,et al.  SRST2: Rapid genomic surveillance for public health and hospital microbiology labs , 2014, bioRxiv.

[12]  S. Das,et al.  IS6110 restriction fragment length polymorphism typing of clinical isolates of Mycobacterium tuberculosis from patients with pulmonary tuberculosis in Madras, south India. , 1995, Tubercle and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[13]  Kenneth D. Doig,et al.  On the origin of Mycobacterium ulcerans, the causative agent of Buruli ulcer , 2012, BMC Genomics.

[14]  Mycobacterium tuberculosis expressing phospholipase C subverts PGE2 synthesis and induces necrosis in alveolar macrophages , 2014, BMC Microbiology.

[15]  M. Hamidian,et al.  Tn6168, a transposon carrying an ISAba1-activated ampC gene and conferring cephalosporin resistance in Acinetobacter baumannii. , 2014, The Journal of antimicrobial chemotherapy.

[16]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[17]  K. Sakae,et al.  Development of a Rapid PCR Method Using the Insertion Sequence IS1203 for Genotyping Shiga Toxin-Producing Escherichia coli O157 , 2004, Journal of Clinical Microbiology.

[18]  M. Adams,et al.  Genomewide Analysis of Divergence of Antibiotic Resistance Determinants in Closely Related Isolates of Acinetobacter baumannii , 2010, Antimicrobial Agents and Chemotherapy.

[19]  P. V. van Helden,et al.  The role of IS6110 in the evolution of Mycobacterium tuberculosis. , 2007, Tuberculosis.

[20]  M. Hamidian,et al.  ISAba1 targets a specific position upstream of the intrinsic ampC gene of Acinetobacter baumannii leading to cephalosporin resistance. , 2013, The Journal of antimicrobial chemotherapy.

[21]  K. Holt,et al.  Out-of-Africa migration and Neolithic co-expansion of Mycobacterium tuberculosis with modern humans , 2013, Nature Genetics.

[22]  W. Kern,et al.  Enhanced Expression of the Multidrug Efflux Pumps AcrAB and AcrEF Associated with Insertion Element Transposition in Escherichia coli Mutants Selected with a Fluoroquinolone , 2001, Antimicrobial Agents and Chemotherapy.

[23]  R. Brosch,et al.  Deciphering the role of IS6110 in a highly transmissible Mycobacterium tuberculosis Beijing strain, GC1237. , 2011, Tuberculosis.

[24]  J. Rolain,et al.  ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes , 2013, Antimicrobial Agents and Chemotherapy.

[25]  A. Cloeckaert,et al.  Overexpression of the Multidrug Efflux Operon acrEF by Insertional Activation with IS1 or IS10 Elements in Salmonella enterica Serovar Typhimurium DT204 acrB Mutants Selected with Fluoroquinolones , 2005, Antimicrobial Agents and Chemotherapy.

[26]  J. T. Crawford,et al.  Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology , 1993, Journal of clinical microbiology.

[27]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[28]  Patricia Siguier,et al.  ISfinder: the reference centre for bacterial insertion sequences , 2005, Nucleic Acids Res..

[29]  Julian Parkhill,et al.  Tracking the establishment of local endemic populations of an emergent enteric pathogen , 2013, Proceedings of the National Academy of Sciences.

[30]  M. Pallen,et al.  Culture-independent detection and characterisation of Mycobacterium tuberculosis and M. africanum in sputum samples using shotgun metagenomics on a benchtop sequencer , 2014, PeerJ.

[31]  A. Chan,et al.  A generic mechanism in Neisseria meningitidis for enhanced resistance against bactericidal antibodies , 2008, The Journal of experimental medicine.

[32]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[33]  W. Johnson,et al.  Phospholipase Region of Mycobacterium tuberculosis Is a Preferential Locus for IS6110 Transposition , 2001, Journal of Clinical Microbiology.

[34]  Akira Takahashi,et al.  Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements , 2014, BMC Bioinformatics.

[35]  C. Soto,et al.  IS6110 Mediates Increased Transcription of the phoP Virulence Gene in a Multidrug-Resistant Clinical Isolate Responsible for Tuberculosis Outbreaks , 2004, Journal of Clinical Microbiology.

[36]  Thomas M. Keane,et al.  RetroSeq: transposable element discovery from next-generation sequencing data , 2013, Bioinform..

[37]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[38]  M. Donald Cave,et al.  Mapping of IS6110 Insertion Sites in Two Epidemic Strains of Mycobacterium tuberculosis , 2000, Journal of Clinical Microbiology.

[39]  B. Aronson,et al.  Activation of a cryptic pathway for threonine metabolism via specific IS3-mediated alteration of promoter structure in Escherichia coli , 1989, Journal of bacteriology.

[40]  Yutaka Okumoto,et al.  The Use of RelocaTE and Unassembled Short Reads to Produce High-Resolution Snapshots of Transposable Element Generated Diversity in Rice , 2013, G3: Genes, Genomes, Genetics.

[41]  Kai Ye,et al.  Mobster: accurate detection of mobile element insertions in next generation sequencing data , 2014, Genome Biology.