RB-Finder: An Improved Distance-Based Sliding Window Method to Detect Recombination Breakpoints

Recombination detection is important before inferring phylogenetic relationships. This will eventually lead to a better understanding of pathogen evolution, more accurate genotyping, and advancements in vaccine development. In this paper, we introduce RB-Finder, a fast and accurate distance-based window method to detect recombination in a multiple sequence alignment. Our method introduces a more informative distance measure and a novel weighting strategy to reduce the window size sensitivity problem and hence improve the accuracy of breakpoint detection. Furthermore, our method is faster than existing phylogeny-based methods since we do not need to construct and compare complex phylogenetic trees. When compared with the current best method Pruned-PDM, our method is a few hundred times more efficient. Experimental evaluation of RB-Finder using synthetic and biological datasets showed that our method is more accurate than existing phylogeny-based methods. We also show how our method has potential use in other related applications such as genotyping.

[1]  Dirk Husmeier,et al.  Detecting interspecific recombination with a pruned probabilistic divergence measure , 2005, Bioinform..

[2]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[3]  R. Kaul,et al.  Recombination following superinfection by HIV-1 , 2004, AIDS.

[4]  Dirk Husmeier,et al.  Probabilistic divergence measures for detecting interspecies recombination , 2001, ISMB.

[5]  H. Schuitemaker,et al.  Phenotype-associated env gene variation among eight related human immunodeficiency virus type 1 clones: evidence for in vivo recombination and determinants of cytotropism outside the V3 domain , 1992, Journal of virology.

[6]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[7]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[8]  W Preiser,et al.  Variety of interpretation systems for human immunodeficiency virus type 1 genotyping: confirmatory information or additional confusion? , 2003, Current drug targets. Infectious disorders.

[9]  M. Uhlén,et al.  Biological and molecular characterization of subtype D, G, and A/D recombinant HIV-1 transmissions in Sweden. , 1995, Virology.

[10]  G. Weiller Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. , 1998, Molecular biology and evolution.

[11]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[12]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[13]  Gráinne McGuire,et al.  TOPAL 2.0: improved detection of mosaic sequences within multiple alignments , 2000, Bioinform..

[14]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[15]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[16]  Luay Nakhleh,et al.  RECOMP: A Parsimony-Based Method for Detecting Recombination , 2005, APBC.

[17]  Michael S Gilmore,et al.  The Thin Line Between Gut Commensal and Pathogen , 2003, Science.

[18]  D. Hartl,et al.  Genetic exchange among natural isolates of bacteria: recombination within the phoA gene of Escherichia coli. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[19]  F. Allendorf,et al.  Rates and patterns of microsatellite mutations in pink salmon. , 2002, Molecular biology and evolution.

[20]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[21]  Martine Peeters,et al.  Hybrid Origin of SIV in Chimpanzees , 2003, Science.

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  Thomas Lengauer,et al.  Recco: recombination analysis using cost optimization , 2006, Bioinform..