Exploring lateral genetic transfer among microbial genomes using TF-IDF

Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.

[1]  J. M. Smith,et al.  Horizontal transfer of penicillin-binding protein genes in penicillin-resistant clinical isolates of Streptococcus pneumoniae. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  L. Boto Horizontal gene transfer in evolution: facts and challenges , 2010, Proceedings of the Royal Society B: Biological Sciences.

[3]  S Karlin,et al.  Detecting Alien Genes in Bacterial Genomes a , 1999, Annals of the New York Academy of Sciences.

[4]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[5]  D. Ussery,et al.  Comparison of 61 Sequenced Escherichia coli Genomes , 2010, Microbial Ecology.

[6]  M. Ragan,et al.  Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? , 2007, Systematic biology.

[7]  I. Matic Rates of change and exchange , 1997 .

[8]  N. Galtier A model of horizontal gene transfer and the bacterial phylogeny problem. , 2007, Systematic biology.

[9]  J. Townsend,et al.  Horizontal gene transfer, genome innovation and evolution , 2005, Nature Reviews Microbiology.

[10]  Mark A. Ragan,et al.  Pattern-Based Phylogenetic Distance Estimation and Tree Reconstruction , 2006, Evolutionary bioinformatics online.

[11]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[12]  Marie-Claude Blatter,et al.  Protein variety and functional diversity: Swiss-Prot annotation in its biological context. , 2005, Comptes rendus biologies.

[13]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[14]  M. Ragan,et al.  Lateral genetic transfer: open issues , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[16]  M. Simmonds,et al.  Genome sequence of Yersinia pestis, the causative agent of plague , 2001, Nature.

[17]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[18]  M. Brockhurst,et al.  Plasmid-mediated horizontal gene transfer is a coevolutionary process. , 2012, Trends in microbiology.

[19]  E. Denamur,et al.  Assigning Escherichia coli strains to phylogenetic groups: multi-locus sequence typing versus the PCR triplex method. , 2008, Environmental microbiology.

[20]  Tandy J. Warnow,et al.  Distance-Based Genome Rearrangement Phylogeny , 2006, Journal of Molecular Evolution.

[21]  J. Palmer,et al.  Horizontal gene transfer in eukaryotic evolution , 2008, Nature Reviews Genetics.

[22]  C. Gyles,et al.  Horizontally Transferred Genetic Elements and Their Role in Pathogenesis of Bacterial Disease , 2014, Veterinary pathology.

[23]  M. Ragan,et al.  Lateral genetic transfer and the construction of genetic exchange communities. , 2011, FEMS microbiology reviews.

[24]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[25]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Bernhard Haubold,et al.  Alignment-free detection of horizontal gene transfer between closely related bacterial genomes , 2011, Mobile genetic elements.

[27]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[28]  F. de la Cruz,et al.  Horizontal gene transfer and the origin of species: lessons from bacteria. , 2000, Trends in microbiology.

[29]  W. Martin,et al.  Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. , 2011, Genome research.

[30]  Bernhard Haubold,et al.  Alignment-free detection of local similarity among viral and bacterial genomes , 2011, Bioinform..

[31]  Bernhard Haubold,et al.  Alignment-free phylogenetics and population genetics , 2014, Briefings Bioinform..

[32]  M. Ragan,et al.  A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF , 2016, Scientific Reports.

[33]  M. J. Davis,et al.  Annotated genes and nonannotated genomes: cross‐species use of Gene Ontology in ecology and evolution research , 2013, Molecular ecology.

[34]  P. Hu,et al.  Structural Organization of Virulence-Associated Plasmids of Yersinia pestis , 1998, Journal of bacteriology.

[35]  M. Ragan,et al.  Next-generation phylogenomics , 2013, Biology Direct.

[36]  L. Luo,et al.  Genome-based phylogeny of dsDNA viruses by a novel alignment-free method. , 2012, Gene.

[37]  Mark A. Ragan,et al.  Evolutionary Dynamics of Small RNAs in 27 Escherichia coli and Shigella Genomes , 2012, Genome biology and evolution.

[38]  James O. McInerney,et al.  The network of life: genome beginnings and evolution , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  Andreas R. Pfenning,et al.  Comparative genomics reveals insights into avian genome evolution and adaptation , 2014, Science.

[40]  M. Ragan,et al.  Are Protein Domains Modules of Lateral Genetic Transfer? , 2009, PloS one.

[41]  S. Garcia-Vallvé,et al.  Horizontal gene transfer in bacterial and archaeal complete genomes. , 2000, Genome research.

[42]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[43]  M. Ragan,et al.  Inferring phylogenies of evolving sequences without multiple sequence alignment , 2014, Scientific Reports.

[44]  Vijay Mahajan,et al.  Extensions and Refinements , 1985 .

[45]  Seth Sullivant,et al.  Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction , 2015, J. Comput. Biol..

[46]  S. Salzberg,et al.  Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima , 1999, Nature.

[47]  F. Moriarty Open issues , 1982, Nature.

[48]  D. Sankoff,et al.  Gene Order Breakpoint Evidence in Animal Mitochondrial Phylogeny , 1999, Journal of Molecular Evolution.

[49]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[50]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[51]  M. Ragan,et al.  Lateral Transfer of Genes and Gene Fragments in Prokaryotes , 2009, Genome biology and evolution.

[52]  Miriam Barlow,et al.  What antimicrobial resistance has taught us about horizontal gene transfer. , 2009, Methods in molecular biology.

[53]  Xiao Sun,et al.  A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping. , 2008, Biochemical and Biophysical Research Communications - BBRC.

[54]  R. Huber,et al.  The complete genome of the hyperthermophilic bacterium Aquifex aeolicus , 1998, Nature.

[55]  Juan Miguel García-Gómez,et al.  Sequence analysis Blast 2 GO : a universal tool for annotation , visualization and analysis in functional genomics research , 2005 .

[56]  T. Schwan,et al.  High‐frequency conjugative transfer of antibiotic resistance genes to Yersinia pestis in the flea midgut , 2002, Molecular microbiology.

[57]  C. Dowson,et al.  Evolution of penicillin resistance in Streptococcus pneumoniae; the role of Streptococcus mitis in the formation of a low affinity PBP2B in S. pneumoniae , 1993, Molecular microbiology.

[58]  Mark A Ragan,et al.  Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella , 2011, BMC Genomics.