Stepwise detection of recombination breakpoints in sequence alignments

MOTIVATION We propose a stepwise approach to identify recombination breakpoints in a sequence alignment. The approach can be applied to any recombination detection method that uses a permutation test and provides estimates of breakpoints. RESULTS We illustrate the approach by analyses of a simulated dataset and alignments of real data from HIV-1 and human chromosome 7. The presented simulation results compare the statistical properties of one-step and two-step procedures. More breakpoints are found with a two-step procedure than with a single application of a given method, particularly for higher recombination rates. At higher recombination rates, the additional breakpoints were located at the cost of only a slight increase in the number of falsely declared breakpoints. However, a large proportion of breakpoints still go undetected. AVAILABILITY A makefile and C source code for phylogenetic profiling and the maximum chi2 method, tested with the gcc compiler on Linux and WindowsXP, are available at http://stat-db.stat.sfu.ca/stepwise/ CONTACT jgraham@stat.sfu.ca.

[1]  G. Weiller Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. , 1998, Molecular biology and evolution.

[2]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[3]  T. Whittam,et al.  Molecular evolution and mosaic structure of alpha, beta, and gamma intimins of pathogenic Escherichia coli. , 1999, Molecular biology and evolution.

[4]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[5]  A. J. Brown,et al.  Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[7]  J. Hein,et al.  A simulation study of the reliability of recombination detection methods. , 2001, Molecular biology and evolution.

[8]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[9]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[10]  Mary Poss,et al.  Evolution of Envelope Sequences from the Genital Tract and Peripheral Blood of Women Infected with Clade A Human Immunodeficiency Virus Type 1 , 1998, Journal of Virology.

[11]  M. Uhlén,et al.  Biological and molecular characterization of subtype D, G, and A/D recombinant HIV-1 transmissions in Sweden. , 1995, Virology.

[12]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[13]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[14]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[15]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[16]  R. Kaul,et al.  Recombination following superinfection by HIV-1 , 2004, AIDS.

[17]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[18]  D. Hartl,et al.  Genetic exchange among natural isolates of bacteria: recombination within the phoA gene of Escherichia coli. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[20]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[21]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[22]  John Maynard Smith,et al.  Analyzing the mosaic structure of genes , 1992, Journal of Molecular Evolution.

[23]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[24]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[25]  L. M. Mansky,et al.  Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase , 1995, Journal of virology.

[26]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[27]  Allen G. Rodrigo,et al.  Testing the Hypothesis of a Recombinant Origin of Human Immunodeficiency Virus Type 1 Subtype E , 2000, Journal of Virology.

[28]  C. Brown,et al.  The power to detect recombination using the coalescent. , 2001, Molecular biology and evolution.

[29]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[31]  R. Hudson,et al.  A statistical test for detecting geographic subdivision. , 1992, Molecular biology and evolution.

[32]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[33]  H. Schuitemaker,et al.  Phenotype-associated env gene variation among eight related human immunodeficiency virus type 1 clones: evidence for in vivo recombination and determinants of cytotropism outside the V3 domain , 1992, Journal of virology.