An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets

Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Δm,n,b. We present a method for rapidly calculating the distribution of Δm,n,b and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.

[1]  Mark Spencer Exact Significance Levels for the Maximum 2 Method of Detecting Recombination , 2003, Bioinform..

[2]  I. Brown,et al.  Recombination Resulting in Virulence Shift in Avian Influenza Outbreak, Chile , 2004, Emerging infectious diseases.

[3]  E. D. Kilbourne Molecular epidemiology--influenza as archetype. , 1979, Harvey lectures.

[4]  Daniel Falush,et al.  Germs, genomes and genealogies. , 2005, Trends in ecology & evolution.

[5]  Mark J. Gibbs,et al.  Recombination in the Hemagglutinin Gene of the 1918 "Spanish Flu" , 2001, Science.

[6]  O. Pybus,et al.  Questioning the evidence for genetic recombination in the 1918 "Spanish flu" virus. , 2002, Science.

[7]  D. Siegmund,et al.  Large deviations for the maxima of some random fields , 1986 .

[8]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[9]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[10]  K. Crandall,et al.  A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. , 2005, AIDS research and human retroviruses.

[11]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[12]  J. Zhou,et al.  Sequence diversity within the argF, fbp and recA genes of natural isolates of Neisseria meningitidis: interspecies recombination within the argF gene , 1992, Molecular microbiology.

[13]  K. Lole,et al.  Full-Length Human Immunodeficiency Virus Type 1 Genomes from Subtype C-Infected Seroconverters in India, with Evidence of Intersubtype Recombination , 1999, Journal of Virology.

[14]  L. Kruglyak,et al.  Patterns of linkage disequilibrium in the human genome , 2002, Nature Reviews Genetics.

[15]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[16]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[17]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[18]  Kilbourne Ed Molecular epidemiology--influenza as archetype. , 1979 .

[19]  E. Holmes,et al.  The population genetics and evolutionary epidemiology of RNA viruses , 2004, Nature Reviews Microbiology.

[20]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[21]  J. Wall,et al.  A comparison of estimators of the population recombination rate. , 2000, Molecular biology and evolution.

[22]  J. M. Smith,et al.  Detecting recombination from gene trees. , 1998, Molecular biology and evolution.

[23]  Alfredo Coppa,et al.  The Role of Selection in the Evolution of Human Mitochondrial Genomes , 2006, Genetics.

[24]  F. Balloux,et al.  Tackling the population genetics of clonal and partially clonal organisms. , 2005, Trends in ecology & evolution.

[25]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[26]  D. Burke,et al.  Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. , 1995, AIDS research and human retroviruses.

[27]  P. Awadalla,et al.  Linkage disequilibrium and recombination in hominid mitochondrial DNA. , 1999, Science.

[28]  J. M. Smith,et al.  How clonal are bacteria? , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[29]  P. Donnelly,et al.  Estimating recombination rates from population genetic data. , 2001, Genetics.

[30]  R. Rott,et al.  Nonhomologous recombination between the hemagglutinin gene and the nucleoprotein gene of an influenza virus. , 1994, Virology.

[31]  Mark J. Gibbs,et al.  Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences , 2000, Bioinform..

[32]  M. Worobey,et al.  A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. , 2001, Molecular biology and evolution.

[33]  R. Rott,et al.  Increased viral pathogenicity after insertion of a 28S ribosomal RNA sequence into the haemagglutinin gene of an influenza virus , 1989, Nature.

[34]  J. M. Smith,et al.  The detection and measurement of recombination from sequence data. , 1999, Genetics.

[35]  Simon Easteal,et al.  A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences , 1996, Comput. Appl. Biosci..

[36]  Brian Skyrms,et al.  Choice And Chance , 1966 .

[37]  C. L. Mallows,et al.  Some Aspects of the Random Sequence , 1965 .

[38]  Leonid Kruglyak,et al.  Patterns Of Linkage Disequilbrium In The Human Genome , 2002, Nature Reviews Genetics.

[39]  Darren Martin,et al.  RDP: detection of recombination amongst aligned sequences , 2000, Bioinform..

[40]  David Posada,et al.  Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method. , 2006, Molecular biology and evolution.

[41]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[42]  A. Eyre-Walker,et al.  A broad survey of recombination in animal mitochondria. , 2004, Molecular biology and evolution.

[43]  W. Jevons,et al.  Choice and Chance , 1870, Nature.

[44]  Amir Dembo,et al.  Statistical Composition of High-Scoring Segments from Molecular Sequences , 1990 .

[45]  E. Holmes,et al.  Phylogenetic evidence for recombination in dengue virus. , 1999, Molecular biology and evolution.

[46]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[47]  David L. Robertson,et al.  Recombination in AIDS viruses , 1995, Journal of Molecular Evolution.

[48]  David Siegmund Approximate Tail Probabilities for the Maxima of Some Random Fields , 1988 .

[49]  A. Eyre-Walker,et al.  A reanalysis of the indirect evidence for recombination in human mitochondrial DNA , 2004, Heredity.

[50]  Cecile Viboud,et al.  Stochastic Processes Are Key Determinants of Short-Term Evolution in Influenza A Virus , 2006, PLoS pathogens.

[51]  J. Stephens,et al.  Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. , 1985, Molecular biology and evolution.

[52]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[53]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[54]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[55]  Bryan T Grenfell,et al.  Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses , 2005, PLoS biology.

[56]  M. Schierup,et al.  Evidence of recombination among early-vaccination era measles virus strains , 2005, BMC Evolutionary Biology.

[57]  M. F. Boni,et al.  Vaccination and antigenic drift in influenza. , 2008, Vaccine.

[58]  J. Hein,et al.  A simulation study of the reliability of recombination detection methods. , 2001, Molecular biology and evolution.

[59]  K. Strimmer,et al.  A novel exploratory method for visual recombination detection , 2003, Genome Biology.

[60]  E. Holmes,et al.  Population dynamics of HIV-1 inferred from gene sequences. , 1999, Genetics.

[61]  Jeffrey D. Wall,et al.  Recombination and the power of statistical tests of neutrality , 1999 .

[62]  P. H. Sneath,et al.  The effect of evenly spaced constant sites on the distribution of the random division of a molecular sequence , 1998, Bioinform..

[63]  N. Takahata Comments on the detection of reciprocal recombination or gene conversion , 2004, Immunogenetics.

[64]  A. Halpern,et al.  A computer program designed to screen rapidly for HIV type 1 intersubtype recombinant sequences. , 1995, AIDS research and human retroviruses.

[65]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[66]  C. Brown,et al.  The power to detect recombination using the coalescent. , 2001, Molecular biology and evolution.

[67]  S. Karlin,et al.  Chance and statistical significance in protein and DNA sequence analysis. , 1992, Science.

[68]  John Maynard Smith,et al.  Analyzing the mosaic structure of genes , 1992, Journal of Molecular Evolution.

[69]  D. Husmeier,et al.  Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. , 2003, Molecular biology and evolution.

[70]  B. Harshbarger An Introduction to Probability Theory and its Applications, Volume I , 1958 .

[71]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[72]  Amir Dembo,et al.  LIMIT DISTRIBUTIONS OF MAXIMAL SEGMENTAL SCORE AMONG MARKOV-DEPENDENT PARTIAL SUMS , 1992 .

[73]  D. Siegmund Boundary Crossing Probabilities and Statistical Applications , 1986 .

[74]  R. Lewontin,et al.  Detecting heterogeneity of substitution along DNA and protein sequences. , 1996, Genetics.

[75]  D. Balding,et al.  Detecting gene conversion: primate visual pigment genes , 1992, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[76]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.