xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria

Background Genomic islands play an important role in microbial genome evolution, providing a mechanism for strains to adapt to new ecological conditions. A variety of computational methods, both genome-composition based and comparative have been developed to identify them. Some of these methods are explicitly designed to work in single strains, while others make use of multiple strains. In general, existing methods do not identify islands in the context of the phylogeny in which they evolved. Even multiple strain approaches are best suited to identifying genomic islands that are present in one strain but absent in others. They do not automatically recognize islands which are shared between some strains in the clade or determine the branch on which these islands inserted within the phylogenetic tree. Results We have developed a software package, xenoGI, that identifies genomic islands and maps their origin within a clade of closely related bacteria, determining which branch they inserted on. It takes as input a set of sequenced genomes and a tree specifying their phylogenetic relationships. Making heavy use of synteny information, the package builds gene families in a species-tree-aware way, and then attempts to combine into islands those families whose members are adjacent and whose most recent common ancestor is shared. The package provides a variety of text-based analysis functions, as well as the ability to export genomic islands into formats suitable for viewing in a genome browser. We demonstrate the capabilities of the package with several examples from enteric bacteria, including an examination of the evolution of the acid fitness island in the genus Escherichia. In addition we use output from simulations and a set of known genomic islands from the literature to show that xenoGI can accurately identify genomic islands and place them on a phylogenetic tree. Conclusions xenoGI is an effective tool for studying the history of genomic island insertions in a clade of microbes. It identifies genomic islands, and determines which branch they inserted on within the phylogenetic tree for the clade. Such information is valuable because it helps us understand the adaptive path that has produced living species. Given the large and growing number of sequenced microbial genomes, this sort of analysis will become increasingly useful in the future.

[1]  J Hacker,et al.  Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extraintestinal Escherichia coli isolates. , 1990, Microbial pathogenesis.

[2]  J. Shea,et al.  Identification of a virulence locus encoding a second type III secretion system in Salmonella typhimurium. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  H. Ochman,et al.  Identification of a pathogenicity island required for Salmonella survival in host cells. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[4]  F. Blattner,et al.  Analysis of the boundaries of Salmonella pathogenicity island 2 and the corresponding chromosomal region of Escherichia coli K-12 , 1997, Journal of bacteriology.

[5]  A. Torres,et al.  Structure of the Shigella dysenteriae haem transport locus and its phylogenetic distribution in enteric bacteria , 1998, Molecular microbiology.

[6]  J. Hacker,et al.  Pathogenicity islands and the evolution of microbes. , 2000, Annual review of microbiology.

[7]  S. Salzberg,et al.  DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae , 2000, Nature.

[8]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[9]  R. Sandberg,et al.  Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. , 2001, Genome research.

[10]  S Karlin,et al.  Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. , 2001, Trends in microbiology.

[11]  S. Weagant,et al.  Glutamate Decarboxylase Genes as a Prescreening Marker for Detection of Pathogenic Escherichia coliGroups , 2001, Applied and Environmental Microbiology.

[12]  B. Barrell,et al.  The complete genome sequence and analysis of Corynebacterium diphtheriae NCTC13129. , 2003, Nucleic acids research.

[13]  Qiang Tu,et al.  Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. , 2003, FEMS microbiology letters.

[14]  Rainer Merkl,et al.  SIGI: score-based identification of genomic islands , 2004, BMC Bioinformatics.

[15]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[16]  A. Danchin,et al.  GadE (YhiE): a novel activator involved in the response to acid environment in Escherichia coli. , 2004, Microbiology.

[17]  Ren Zhang,et al.  A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I , 2004, Bioinform..

[18]  Ulrich Dobrindt,et al.  Genomic islands in pathogenic and environmental microorganisms , 2004, Nature Reviews Microbiology.

[19]  Korine S. E. Ung,et al.  Evidence of a Large Novel Gene Pool Associated with Prokaryotic Genomic Islands , 2005, PLoS genetics.

[20]  Carsten Damm,et al.  Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models , 2006, BMC Bioinformatics.

[21]  Aristotelis Tsirigos,et al.  A new computational method for the detection of horizontal gene transfer events , 2005, Nucleic acids research.

[22]  Paramvir S. Dehal,et al.  A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database , 2006, BMC Bioinformatics.

[23]  Georgios S. Vernikos,et al.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands , 2006, Bioinform..

[24]  Kumar Rajakumar,et al.  A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria , 2006, Nucleic acids research.

[25]  Georgios S. Vernikos,et al.  Genetic flux over time in the Salmonella lineage , 2007, Genome Biology.

[26]  Raghunath Chatterjee,et al.  On detection and assessment of statistical significance of Genomic Islands , 2008, BMC Genomics.

[27]  Teresa M. Bergholz,et al.  Recent gene conversions between duplicated glutamate decarboxylase genes (gadA and gadB) in pathogenic Escherichia coli. , 2007, Molecular biology and evolution.

[28]  Stephen Lory,et al.  MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands , 2007, Nucleic Acids Res..

[29]  Sarang Aravamuthan,et al.  Identification of compositionally distinct regions in genomes using the centroid method , 2007, Bioinform..

[30]  A. Tramonti,et al.  GadX/GadW‐dependent regulation of the Escherichia coli acid fitness island: transcriptional control at the gadY–gadW divergent promoters and identification of four novel 42 bp GadX/GadW‐specific binding sites , 2008, Molecular microbiology.

[31]  Anil Kumar,et al.  PredictBias: A Server for the Identification of Genomic and Pathogenicity Islands in Prokaryotes , 2008, Silico Biol..

[32]  Fiona S. L. Brinkman,et al.  Evaluation of genomic island predictors using a comparative genomics approach , 2008, BMC Bioinformatics.

[33]  Alpan Raval,et al.  Detection of genomic islands via segmental genome heterogeneity , 2009, Nucleic acids research.

[34]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[35]  Fiona S. L. Brinkman,et al.  IslandViewer: an integrated interface for computational identification and visualization of genomic islands , 2009, Bioinform..

[36]  Eduardo P C Rocha,et al.  The Genome of Burkholderia cenocepacia J2315, an Epidemic Pathogen of Cystic Fibrosis Patients , 2008, Journal of bacteriology.

[37]  Keith A. Jolley,et al.  Genomic Evidence for the Evolution of Streptococcus equi: Host Restriction, Increased Virulence, and Genetic Exchange with Human Pathogens , 2009, PLoS pathogens.

[38]  Sharmila S. Mande,et al.  INDeGenIUS, a new method for high-throughput identification of specialized functional islands in completely sequenced organisms , 2010, Journal of Biosciences.

[39]  Yongxiang Zhang,et al.  Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions , 2010, BMC Bioinformatics.

[40]  R. Wilson,et al.  Genome Sequence of Cronobacter sakazakii BAA-894 and Comparative Genomic Hybridization Analysis with Other Cronobacter Species , 2010, PloS one.

[41]  Fiona S. L. Brinkman,et al.  Detecting genomic islands using bioinformatics approaches , 2010, Nature Reviews Microbiology.

[42]  F. Guo,et al.  Prediction of genomic islands in seven human pathogens using the Z-Island method. , 2011, Genetics and molecular research : GMR.

[43]  D. Holden,et al.  Functions of the Salmonella pathogenicity island 2 (SPI-2) type III secretion system effectors. , 2012, Microbiology.

[44]  Lenwood S. Heath,et al.  REGEN: Ancestral Genome Reconstruction for Bacteria , 2012, Genes.

[45]  J. Baumbach,et al.  PIPS: Pathogenicity Island Prediction Software , 2012, PloS one.

[46]  Derrick E. Fouts,et al.  PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species , 2012, Nucleic acids research.

[47]  Matthew R. Laird,et al.  IslandViewer update: improved genomic island discovery and visualization , 2013, Nucleic Acids Res..

[48]  Chuan Yi Tang,et al.  GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects. , 2013, Gene.

[49]  B. Contreras-Moreira,et al.  GET_HOMOLOGUES, a Versatile Software Package for Scalable and Robust Microbial Pangenome Analysis , 2013, Applied and Environmental Microbiology.

[50]  Stephanie J. Spielman,et al.  Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies , 2015, bioRxiv.

[51]  Development of a Real‐Time PCR for Escherichia coli based on gadE, an acid response regulatory gene , 2015, Letters in applied microbiology.

[52]  Kelly P. Williams,et al.  Islander: a database of precisely mapped genomic islands in tRNA and tmRNA genes , 2014, Nucleic Acids Res..

[53]  Sandip Paul,et al.  PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes. , 2015, Genomics.

[54]  Hon Wai Leong,et al.  Computational methods for predicting genomic islands in microbial genomes. , 2016, Computational and structural biotechnology journal.

[55]  David C. Norris,et al.  Integrated genome browser: visual analytics platform for genomics , 2015, bioRxiv.

[56]  Jeff Daily,et al.  Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments , 2016, BMC Bioinformatics.

[57]  Ju Wang,et al.  Zisland Explorer: detect genomic islands by combining homogeneity and heterogeneity properties , 2016, Briefings Bioinform..

[58]  Matthew R. Laird,et al.  IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets , 2017, Nucleic Acids Res..