Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions

Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be ignored when reconciling gene trees. In addition to these three confounding factors, we consider the problem of enumerating all valid coalescent scenarios that constitute plausible species/gene tree reconciliations, and develop a polynomial-time dynamic programming algorithm for solving it. This result bears great significance on reducing the search space for heuristics that seek reconciliation scenarios. Finally, we show, empirically, that the locality of incongruence between a pair of trees has an impact on the numbers of HGT and coalescent reconciliation scenarios.

[1]  T. D. Read,et al.  Role of Mobile DNA in the Evolution of Vancomycin-Resistant Enterococcus faecalis , 2003, Science.

[2]  C. J-F,et al.  THE COALESCENT , 1980 .

[3]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.

[4]  Eric Bapteste,et al.  Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement , 2005, BMC Evolutionary Biology.

[5]  N. Takahata Gene genealogy in three related populations: consistency probability between gene and population trees. , 1989, Genetics.

[6]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[7]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[8]  Richard R. Hudson,et al.  TESTING THE CONSTANT‐RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA , 1983, Evolution; international journal of organic evolution.

[9]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[10]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[11]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[12]  R. Macaden,et al.  Uropathogenic Escherichia coli--a preliminary study. , 2003, Indian journal of pathology & microbiology.

[13]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[14]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[15]  W. Ewens Mathematical Population Genetics , 1980 .

[16]  C. Ouzounis,et al.  The balance of driving forces during genome evolution in prokaryotes. , 2003, Genome research.

[17]  F. Blattner,et al.  Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[19]  Luay Nakhleh,et al.  Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer , 2008, RECOMB-CG.

[20]  Luay Nakhleh,et al.  Techniques for Assessing Phylogenetic Branch Support: A Performance Study , 2005, APBC.

[21]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[22]  W. Maddison Gene Trees in Species Trees , 1997 .

[23]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[24]  N. Moran,et al.  Phylogenetics and the Cohesion of Bacterial Genomes , 2003, Science.

[25]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[26]  Leon Goldovsky,et al.  The net of life: reconstructing the microbial phylogenetic network. , 2005, Genome research.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  THOMAS F. TURNER Evolutionary Genetics: Concepts and Case Studies , 2007 .

[29]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[30]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[31]  Tandy J. Warnow,et al.  Phylogenetic networks: modeling, reconstructibility, and accuracy , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[33]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[34]  Vladimir Makarenkov,et al.  T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks , 2001, Bioinform..

[35]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[36]  Michael T. Hallett,et al.  Towards Identifying Lateral Gene Transfer Events , 2002, Pacific Symposium on Biocomputing.

[37]  Jason B. Wolf,et al.  Evolutionary genetics : concepts and case studies , 2006 .