gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances

Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).

[1]  B. Murrell,et al.  RDP4: Detection and analysis of recombination patterns in virus genomes , 2015, Virus evolution.

[2]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[3]  J. Corander,et al.  Recent Recombination Events in the Core Genome Are Associated with Adaptive Evolution in Enterococcus faecium , 2013, Genome biology and evolution.

[4]  M. Bracho,et al.  Recombination in Hepatitis C Virus , 2011, Viruses.

[5]  J. Corander,et al.  Detection of recombination events in bacterial genomes from large population samples , 2011, Nucleic acids research.

[6]  Thomas Wiehe,et al.  Estimating Mutation Distances from Unaligned Genomes , 2009, J. Comput. Biol..

[7]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[8]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ming Zhang,et al.  A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes , 2006, BMC Bioinformatics.

[10]  D. Falush,et al.  Inference of Bacterial Microevolution Using Multilocus Sequence Data , 2007, Genetics.

[11]  Jinbo Xu,et al.  A multiple‐template approach to protein threading , 2011, Proteins.

[12]  D. Falush,et al.  Inference of Population Structure using Dense Haplotype Data , 2012, PLoS genetics.

[13]  Burkhard Morgenstern,et al.  jpHMM: recombination analysis in viruses with circular genomes such as the hepatitis B virus , 2012, Nucleic Acids Res..

[14]  Anne-Mieke Vandamme,et al.  Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools. , 2013, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[15]  Sudhir Kumar,et al.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. , 2016, Molecular biology and evolution.

[16]  Reed A. Cartwright,et al.  DNA assembly with gaps (Dawg): simulating sequence evolution , 2005, Bioinform..

[17]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2002, ESA.

[18]  K. Stedman,et al.  A novel virus genome discovered in an extreme environment suggests recombination between unrelated groups of RNA and DNA viruses , 2012, Biology Direct.

[19]  Koichiro Tamura,et al.  Evolutionary Genetics Analysis Version 6 . 0 , 2013 .

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[22]  V. Blancato,et al.  Genomic comparative analysis of the environmental Enterococcus mundtii against enterococcal representative species , 2014, BMC Genomics.

[23]  Anne-Kathrin Schultz,et al.  A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection , 2013, BMC Bioinformatics.

[24]  Daniel J. Wilson,et al.  ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes , 2015, PLoS Comput. Biol..

[25]  E. Michael Gertz BLAST Scoring Parameters , 2006 .

[26]  Miriam Barlow,et al.  What antimicrobial resistance has taught us about horizontal gene transfer. , 2009, Methods in molecular biology.

[27]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[28]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[29]  Giovanni Manzini,et al.  Engineering a Lightweight Suffix Array Construction Algorithm , 2004, Algorithmica.

[30]  D. Falush,et al.  Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences , 2010, Genetics.

[31]  Daniel Falush,et al.  Efficient Inference of Recombination Hot Regions in Bacterial Genomes , 2014, Molecular biology and evolution.

[32]  Bernhard Haubold,et al.  Alignment-free detection of local similarity among viral and bacterial genomes , 2011, Bioinform..

[33]  Janice K. Wiedenbeck,et al.  Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. , 2011, FEMS microbiology reviews.

[34]  Bernhard Haubold,et al.  Alignment-free detection of horizontal gene transfer between closely related bacterial genomes , 2011, Mobile genetic elements.

[35]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[36]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[37]  Allison D. Griggs,et al.  Comparative Genomics of Enterococci: Variation in Enterococcus faecalis, Clade Structure in E. faecium, and Defining Characteristics of E. gallinarum and E. casseliflavus , 2012, mBio.