Detecting Horizontal Gene Transfer by Mapping Sequencing Reads Across Species Boundaries

MOTIVATION Horizontal gene transfer (HGT) is a fundamental mechanism that enables organisms such as bacteria to directly transfer genetic material between distant species. This way, bacteria can acquire new traits such as antibiotic resistance or pathogenic toxins. Current bioinformatics approaches focus on the detection of past HGT events by exploring phylogenetic trees or genome composition inconsistencies. However, these techniques normally require the availability of finished and fully annotated genomes and of sufficiently large deviations that allow detection and are thus not widely applicable. Especially in outbreak scenarios with HGT-mediated emergence of new pathogens, like the enterohemorrhagic Escherichia coli outbreak in Germany 2011, there is need for fast and precise HGT detection. Next-generation sequencing (NGS) technologies facilitate rapid analysis of unknown pathogens but, to the best of our knowledge, so far no approach detects HGTs directly from NGS reads. RESULTS We present Daisy, a novel mapping-based tool for HGT detection. Daisy determines HGT boundaries with split-read mapping and evaluates candidate regions relying on read pair and coverage information. Daisy successfully detects HGT regions with base pair resolution in both simulated and real data, and outperforms alternative approaches using a genome assembly of the reads. We see our approach as a powerful complement for a comprehensive analysis of HGT in the context of NGS data. AVAILABILITY AND IMPLEMENTATION Daisy is freely available from http://github.com/ktrappe/daisy CONTACT renardb@rki.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  K. Shanmugam,et al.  Genetic improvement of Escherichia coli for ethanol production: chromosomal integration of Zymomonas mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II , 1991, Applied and environmental microbiology.

[2]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[3]  Jie Dong,et al.  Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery , 2005, Nucleic acids research.

[4]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[5]  V. Makarenkov,et al.  Inferring and validating horizontal gene transfer events using bipartition dissimilarity. , 2010, Systematic biology.

[6]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[7]  C. Gyles,et al.  Horizontally Transferred Genetic Elements and Their Role in Pathogenesis of Bacterial Disease , 2014, Veterinary pathology.

[8]  G. Perrière,et al.  The source of laterally transferred genes in bacterial genomes , 2003, Genome Biology.

[9]  T. Thomas,et al.  Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions , 2014, Microbial Informatics and Experimentation.

[10]  Janice K. Wiedenbeck,et al.  Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. , 2011, FEMS microbiology reviews.

[11]  Catherine Brooksbank,et al.  The European Bioinformatics Institute’s data resources , 2009, Nucleic Acids Res..

[12]  Eduardo N. Taboada,et al.  Genome evolution in major Escherichia coli O157:H7 lineages , 2007, BMC Genomics.

[13]  David A. Rasko,et al.  Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions , 2013, Nature Reviews Genetics.

[14]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[15]  Klaus Stark,et al.  Epidemic profile of Shiga-toxin-producing Escherichia coli O104:H4 outbreak in Germany. , 2011, The New England journal of medicine.

[16]  Haixu Tang,et al.  Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing , 2012, Proceedings of the National Academy of Sciences.

[17]  Miriam Barlow,et al.  What antimicrobial resistance has taught us about horizontal gene transfer. , 2009, Methods in molecular biology.

[18]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[19]  Iman Hajirasouliha,et al.  MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels , 2013, Bioinform..

[20]  Phillip I. Tarr,et al.  Escherichia coli O157:H7 Shiga Toxin-Encoding Bacteriophages: Integrations, Excisions, Truncations, and Evolutionary Implications , 2003, Journal of bacteriology.

[21]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[22]  Darren L. Smith,et al.  Comparative genomics of Shiga toxin encoding bacteriophages , 2012, BMC Genomics.

[23]  C. Alkan,et al.  MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions , 2009, Nature Methods.

[24]  L. Boto Horizontal gene transfer in evolution: facts and challenges , 2010, Proceedings of the Royal Society B: Biological Sciences.

[25]  C. William Keevil,et al.  Horizontal Transfer of Antibiotic Resistance Genes on Abiotic Touch Surfaces: Implications for Public Health , 2012, mBio.

[26]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[27]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[28]  Craig A. Cummings,et al.  Escherichia coli Serotype O55:H7 Diversity Supports Parallel Acquisition of Bacteriophage at Shiga Toxin Phage Insertion Sites during Evolution of the O157:H7 Lineage , 2012, Journal of bacteriology.

[29]  Tetsuya Hayashi,et al.  The Defective Prophage Pool of Escherichia coli O157: Prophage–Prophage Interactions Potentiate Horizontal Transfer of Virulence Determinants , 2009, PLoS pathogens.

[30]  Susanna C. Manrubia,et al.  Large-Scale Genomic Analysis Suggests a Neutral Punctuated Dynamics of Transposable Elements in Bacterial Genomes , 2014, PLoS Comput. Biol..

[31]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[32]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[33]  O. Kalinina,et al.  Detection of atypical genes in virus families using a one-class SVM , 2014, BMC Genomics.

[34]  Georgios S. Vernikos,et al.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands , 2006, Bioinform..

[35]  Bernhard Y. Renard,et al.  Metagenomic Profiling of Known and Unknown Microbes with MicrobeGPS , 2015, PloS one.

[36]  Knut Reinert,et al.  Fast and accurate read mapping with approximate seeds and multiple backtracking , 2012, Nucleic acids research.

[37]  Yadong Wang,et al.  PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants , 2012, Bioinform..

[38]  Darren L. Smith,et al.  Immunity Profiles of Wild-Type and Recombinant Shiga-Like Toxin-Encoding Bacteriophages and Characterization of Novel Double Lysogens , 2003, Infection and Immunity.

[39]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[40]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[41]  Laura R. Jarboe,et al.  Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilispdc and adhB genes , 2012, Journal of Industrial Microbiology & Biotechnology.

[42]  N. Kyrpides,et al.  Individual genome assembly from complex community short-read metagenomic datasets , 2011, The ISME Journal.

[43]  Manolis Kellis,et al.  Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss , 2012, Bioinform..

[44]  Christophe Dessimoz,et al.  Inferring Horizontal Gene Transfer , 2015, PLoS Comput. Biol..

[45]  Rolf Apweiler,et al.  The European Bioinformatics Institute’s data resources 2014 , 2013, Nucleic Acids Res..

[46]  Ming-Shiang Wu,et al.  Genome Sequences of Three Helicobacter pylori Strains from Patients with Gastric Mucosa-Associated Lymphoid Tissue Lymphoma , 2015, Genome Announcements.

[47]  Bernhard Y. Renard,et al.  Analyzing genome coverage profiles with applications to quality control in metagenomics , 2013, Bioinform..

[48]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[49]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[50]  Tobias Marschall,et al.  Sensitive Long-Indel-Aware Alignment of Sequencing Reads , 2013, 1303.3520.

[51]  M. Rieder,et al.  Detection of structural variants and indels within exome data , 2011, Nature Methods.

[52]  Chien-Chi Lo,et al.  Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics. , 2011, Briefings in functional genomics.

[53]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[54]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[55]  Guusje Bonnema,et al.  Making the difference: integrating structural variation detection tools , 2015, Briefings Bioinform..

[56]  Piotr Wojtek Dabrowski,et al.  SuRankCo: supervised ranking of contigs in de novo assemblies , 2015, BMC Bioinformatics.

[57]  Christopher A. Miller,et al.  ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads , 2011, PloS one.

[58]  Herbert Schmidt,et al.  Shiga toxin-encoding bacteriophages--genomes in motion. , 2004, International journal of medical microbiology : IJMM.

[59]  Gary Benson,et al.  Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data , 2014, BMC Bioinformatics.

[60]  Howard Ochman,et al.  Reconciling the many faces of lateral gene transfer. , 2002, Trends in microbiology.

[61]  David B. Knoester,et al.  Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq , 2014, BMC Genomics.

[62]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[63]  Martin Vingron,et al.  Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS , 2012, Bioinform..

[64]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[65]  Knut Reinert,et al.  Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone , 2014, Bioinform..

[66]  Ryan R. Wick,et al.  ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data , 2015, BMC Genomics.

[67]  Alexander Schönhuth,et al.  SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines , 2015, BMC Genomics.

[68]  Natália Martínková,et al.  SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes , 2014, Bioinform..

[69]  Gos Micklem,et al.  Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes , 2015, Genome Biology.

[70]  Alexander Schliep,et al.  CLEVER: clique-enumerating variant finder , 2012, Bioinform..

[71]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[72]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[73]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.