HGTector: An automated method facilitating genome-wide discovery of putative horizontal gene transfers

A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has low requirements for computational resources, and is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes.

[1]  Terry Gaasterland,et al.  BMC Bioinformatics BioMed Central Database , 2008 .

[2]  Andrei N Lupas,et al.  PhyloGenie: automated phylome generation and analysis. , 2004, Nucleic acids research.

[3]  Georgios S. Vernikos,et al.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands , 2006, Bioinform..

[4]  L. Mcdaniel,et al.  High Frequency of Horizontal Gene Transfer in the Oceans , 2010, Science.

[5]  J A Eisen,et al.  Assessing evolutionary relationships among microbes from whole-genome analysis. , 2000, Current opinion in microbiology.

[6]  S. Sheather Density Estimation , 2004 .

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  O. Gascuel,et al.  SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. , 2010, Molecular biology and evolution.

[9]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[10]  W. Doolittle,et al.  Systematic overestimation of gene gain through false diagnosis of gene absence , 2007, Genome Biology.

[11]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[12]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[13]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. Salzberg,et al.  Contamination in the Draft of the Human Genome Masquerades As Lateral Gene Transfer , 2002, DNA Sequence.

[15]  Sophie S Abby,et al.  Lateral gene transfer as a support for the tree of life , 2012, Proceedings of the National Academy of Sciences.

[16]  Otto X. Cordero,et al.  Ecology drives a global network of gene exchange connecting the human microbiome , 2011, Nature.

[17]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[18]  Han Wang,et al.  GIST: Genomic island suite of tools for predicting genomic islands in genomic sequences , 2012, Bioinformation.

[19]  Fiona S. L. Brinkman,et al.  IslandViewer: an integrated interface for computational identification and visualization of genomic islands , 2009, Bioinform..

[20]  Steven J. M. Jones,et al.  IslandPath: aiding detection of genomic islands in prokaryotes , 2003, Bioinform..

[21]  E V Koonin,et al.  Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange. , 1999, Trends in genetics : TIG.

[22]  Terry Gaasterland,et al.  DarkHorse: a method for genome-wide prediction of horizontal gene transfer , 2007, Genome Biology.

[23]  Howard Ochman,et al.  Reconciling the many faces of lateral gene transfer. , 2002, Trends in microbiology.

[24]  P. Pontarotti,et al.  An automated approach for the identification of horizontal gene transfers from complete genomes reveals the rhizome of Rickettsiales , 2012, BMC Evolutionary Biology.

[25]  W. Doolittle,et al.  Lateral gene transfer , 2011, Current Biology.

[26]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[27]  J. Claverie,et al.  Lateral gene transfer between obligate intracellular bacteria: evidence from the Rickettsia massiliae genome. , 2007, Genome research.

[28]  Robert L Charlebois,et al.  Chlamydia: 780.57 (sd = 1.81), range 778–784, n =7 Cyanobacteria: 820.50 (sd = 23.53), range 776–844, n =8 , 2022 .

[29]  L. Zé-Zé,et al.  Rickettsiae phylogeny: a multigenic approach. , 2007, Microbiology.

[30]  Sharmila S. Mande,et al.  INDeGenIUS, a new method for high-throughput identification of specialized functional islands in completely sequenced organisms , 2010, Journal of Biosciences.

[31]  Philippe Grosjean,et al.  Package for Analysis of Space-Time Ecological Series , 2014 .

[32]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[33]  J. Hacker,et al.  Pathogenicity islands and the evolution of microbes. , 2000, Annual review of microbiology.

[34]  Sebastian Maurer-Stroh,et al.  More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology , 2010, PLoS Comput. Biol..

[35]  Jacqueline A. Servin,et al.  Decoding the genomic tree of life , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J. Banfield,et al.  Gene Transfer from Bacteria and Archaea Facilitated Evolution of an Extremophilic Eukaryote , 2013, Science.

[37]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[38]  Olga Zhaxybayeva,et al.  Detection and quantitative assessment of horizontal gene transfer. , 2009, Methods in molecular biology.

[39]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[40]  J. Welch,et al.  Conjugation genes are common throughout the genus Rickettsia and are transmitted horizontally , 2009, Proceedings of the Royal Society B: Biological Sciences.

[41]  Fiona S. L. Brinkman,et al.  Evaluation of genomic island predictors using a comparative genomics approach , 2008, BMC Bioinformatics.

[42]  Maureen A. O’Malley,et al.  Prokaryotic evolution and the tree of life are two different things , 2009, Biology Direct.

[43]  Tal Dagan,et al.  Trends and barriers to lateral gene transfer in prokaryotes. , 2011, Current opinion in microbiology.

[44]  Timothy J. Harlow,et al.  Do different surrogate methods detect lateral genetic transfer events of different relative ages? , 2006, Trends in microbiology.

[45]  M. Ragan,et al.  Inferring Genome Trees by Using a Filter To Eliminate Phylogenetically Discordant Sequences and a Distance Matrix Based on Mean Normalized BLASTP Scores , 2002, Journal of bacteriology.

[46]  Carsten Damm,et al.  Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models , 2006, BMC Bioinformatics.

[47]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[48]  J. Peter Gogarten,et al.  Natural taxonomy in light of horizontal gene transfer , 2010 .

[49]  Michael J. Stanhope,et al.  Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates , 2001, Nature.

[50]  Qiang Tu,et al.  Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. , 2003, FEMS microbiology letters.

[51]  Eugene V Koonin,et al.  The fundamental units, processes and patterns of evolution, and the Tree of Life conundrum , 2009, Biology Direct.

[52]  T. Sicheritz-Pontén,et al.  A phylogenomic approach to microbial evolution. , 2001, Nucleic acids research.

[53]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[54]  Jeff Dean,et al.  Time Series , 2009, Encyclopedia of Database Systems.

[55]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[56]  Giovanna Menardi,et al.  Density-based Silhouette diagnostics for clustering methods , 2011, Stat. Comput..

[57]  Dongsheng Che,et al.  EGID: an ensemble algorithm for improved genomic island detection in genomic sequences , 2011, Bioinformation.

[58]  J. Claverie,et al.  The Genome Sequence of Rickettsia felis Identifies the First Putative Conjugative Plasmid in an Obligate Intracellular Parasite , 2005, PLoS biology.

[59]  C. Notredame,et al.  The rhizome of life: the sympatric Rickettsia felis paradigm demonstrates the random transfer of DNA sequences. , 2011, Molecular biology and evolution.

[60]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[61]  D. Raoult,et al.  Rickettsial evolution in the light of comparative genomics , 2011, Biological reviews of the Cambridge Philosophical Society.

[62]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[63]  Tal Dagan,et al.  Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution , 2008, Proceedings of the National Academy of Sciences.

[64]  L. Boto Horizontal gene transfer in evolution: facts and challenges , 2010, Proceedings of the Royal Society B: Biological Sciences.

[65]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[66]  E. Koonin,et al.  Horizontal gene transfer in prokaryotes: quantification and classification. , 2001, Annual review of microbiology.

[67]  S. Salzberg,et al.  Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima , 1999, Nature.

[68]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[69]  R. L. Charlebois,et al.  Characterization of species-specific genes using a flexible, web-based querying system. , 2003, FEMS microbiology letters.

[70]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[71]  I. Kohane,et al.  Taxonomizing, sizing, and overcoming the incidentalome , 2012, Genetics in Medicine.

[72]  J. Hartigan,et al.  The Dip Test of Unimodality , 1985 .

[73]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[74]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[75]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[76]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[77]  Gabriel Moreno-Hagelsieb,et al.  Choosing BLAST options for better detection of orthologs as reciprocal best hits , 2008, Bioinform..

[78]  Mehrdad Hajibabaei,et al.  Next‐generation sequencing technologies for environmental DNA research , 2012, Molecular ecology.