A Simple and Robust Statistical Test for Detecting the Presence of Recombination

Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, Φw, that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max χ2, NSS, a coalescent-based likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r2 and |D′|) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max χ2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that Φw is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances.

[1]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[2]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[3]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  W. J. Quesne,et al.  A Method of Selection of Characters in Numerical Taxonomy , 1969 .

[6]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[7]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[8]  P. H. A. Sneath,et al.  Detecting Evolutionary Incompatibilities From Protein Sequences , 1975 .

[9]  C. J-F,et al.  THE COALESCENT , 1980 .

[10]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[11]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[12]  D Penny,et al.  Estimating the reliability of evolutionary trees. , 1986, Molecular biology and evolution.

[13]  W. G. Hill,et al.  Nonuniform recombination within the human beta-globin gene cluster. , 1986, American journal of human genetics.

[14]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[15]  C. Langley,et al.  Molecular and phenotypic variation of the white locus region in Drosophila melanogaster. , 1988, Genetics.

[16]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[17]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[18]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[19]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[20]  S. Schaeffer,et al.  Estimates of linkage disequilibrium and the recombination parameter determined from segregating nucleotide sites in the alcohol dehydrogenase region of Drosophila pseudoobscura. , 1993, Genetics.

[21]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[22]  M. Slatkin Linkage disequilibrium in growing and stable populations. , 1994, Genetics.

[23]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[24]  Simon Easteal,et al.  A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences , 1996, Comput. Appl. Biosci..

[25]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[26]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[27]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[28]  R. Nielsen,et al.  Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. , 1997, Systematic biology.

[29]  J. Wakeley,et al.  A coalescent estimator of the population recombination rate. , 1997, Genetics.

[30]  J. M. Smith,et al.  Free recombination within Helicobacter pylori. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Tavaré,et al.  The age of a mutation in a general coalescent tree , 1998 .

[32]  G. Weiller Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. , 1998, Molecular biology and evolution.

[33]  E. Holmes,et al.  Population dynamics of HIV-1 inferred from gene sequences. , 1999, Genetics.

[34]  P. Awadalla,et al.  Linkage disequilibrium and recombination in hominid mitochondrial DNA. , 1999, Science.

[35]  T. Bruns,et al.  Use of atp6 in fungal phylogenetics: an example from the boletales. , 1999, Molecular phylogenetics and evolution.

[36]  G. Drouin,et al.  Detecting and characterizing gene conversions between multigene family members. , 1999, Molecular biology and evolution.

[37]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[38]  Darren Martin,et al.  RDP: detection of recombination amongst aligned sequences , 2000, Bioinform..

[39]  J. Hein,et al.  The coalescent with gene conversion. , 2000, Genetics.

[40]  A. Meyer,et al.  Molecular phylogeny of European muroid rodents based on complete cytochrome b sequences. , 2000, Molecular phylogenetics and evolution.

[41]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[42]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[43]  Gráinne McGuire,et al.  TOPAL 2.0: improved detection of mosaic sequences within multiple alignments , 2000, Bioinform..

[44]  J. Wall,et al.  A comparison of estimators of the population recombination rate. , 2000, Molecular biology and evolution.

[45]  M. Nishioka,et al.  Molecular phylogenetic relationships of pond frogs distributed in the Palearctic region inferred from DNA sequences of mitochondrial 12S ribosomal RNA and cytochrome b genes. , 2000, Molecular phylogenetics and evolution.

[46]  J. Hein,et al.  Recombination and the molecular clock. , 2000, Molecular biology and evolution.

[47]  Hey,et al.  Human mitochondrial DNA recombination: can it be true? , 2000, Trends in ecology & evolution.

[48]  J. Hein,et al.  A simulation study of the reliability of recombination detection methods. , 2001, Molecular biology and evolution.

[49]  G. McVean What do patterns of genetic variability reveal about mitochondrial recombination? , 2001, Heredity.

[50]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[51]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[52]  C. Brown,et al.  The power to detect recombination using the coalescent. , 2001, Molecular biology and evolution.

[53]  R. Hudson Two-locus sampling distributions and their application. , 2001, Genetics.

[54]  E. Ladoukakis,et al.  Recombination in animal mitochondrial DNA: evidence from published sequences. , 2001, Molecular biology and evolution.

[55]  D. Posada,et al.  Unveiling the molecular clock in the presence of recombination. , 2001, Molecular biology and evolution.

[56]  T. Jones,et al.  Infrequent Genetic Exchange and Recombination in the Mitochondrial Genome of Candida albicans , 2001, Journal of bacteriology.

[57]  G. McVean,et al.  A genealogical interpretation of linkage disequilibrium. , 2002, Genetics.

[58]  M. Nordborg,et al.  Recombination or mutational hot spots in human mtDNA? , 2002, Molecular biology and evolution.

[59]  M. Nordborg,et al.  Sequence variation and haplotype structure surrounding the flowering time locus FRI in Arabidopsis thaliana. , 2002, Genetics.

[60]  J. M. Smith,et al.  Recombination in animal mitochondrial DNA. , 2002, Molecular biology and evolution.

[61]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[62]  F. Jiggins The rate of recombination in Wolbachia bacteria. , 2002, Molecular biology and evolution.

[63]  P. Fearnhead,et al.  A coalescent-based method for detecting and estimating recombination from gene sequences. , 2002, Genetics.

[64]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[65]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[66]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[67]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[68]  A. Eyre-Walker,et al.  A broad survey of recombination in animal mitochondria. , 2004, Molecular biology and evolution.

[69]  John Maynard Smith,et al.  Analyzing the mosaic structure of genes , 1992, Journal of Molecular Evolution.

[70]  Yun S. Song,et al.  On the minimum number of recombination events in the evolutionary history of DNA sequences , 2004, Journal of mathematical biology.

[71]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[72]  P. Awadalla,et al.  Low linkage disequilibrium indicative of recombination in foot-and-mouth disease virus gene sequence alignments. , 2004, The Journal of general virology.

[73]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[74]  Anastasios D. Tsaousis,et al.  Widespread recombination in published animal mtDNA sequences. , 2005, Molecular biology and evolution.

[75]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..

[76]  A. Rethwilm,et al.  Evidence of Recombination in the Norovirus Capsid Gene , 2005, Journal of Virology.

[77]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[78]  A Subdivision Approach to Maximum Parsimony , 2008 .