Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination

Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.

[1]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[2]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..

[3]  Sergei L. Kosakovsky Pond,et al.  Not so different after all: a comparison of methods for detecting amino acid sites under selection. , 2005, Molecular biology and evolution.

[4]  P. Donnelly,et al.  Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees , 2005, Science.

[5]  Daniel J. Wilson,et al.  The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. , 2005, Molecular biology and evolution.

[6]  T. Massingham,et al.  Detecting Amino Acid Sites Under Positive Selection and Purifying Selection , 2005, Genetics.

[7]  Daniel Falush,et al.  Germs, genomes and genealogies. , 2005, Trends in ecology & evolution.

[8]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[9]  Jonathan P. Bollback,et al.  Posterior Mapping and Posterior Predictive Distributions , 2005 .

[10]  Nick Goldman,et al.  Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites , 2004, Genetics.

[11]  B. Moury Differential selection of genes of cucumber mosaic virus subgroups. , 2004, Molecular biology and evolution.

[12]  N. Mundy,et al.  Rapid evolution by positive Darwinian selection in the extracellular domain of the abundant lymphocyte protein CD45 in primates. , 2004, Molecular biology and evolution.

[13]  Anne-Mieke Vandamme,et al.  Mapping Sites of Positive Selection and Amino Acid Diversification in the HIV Genome , 2004, Genetics.

[14]  J. Huelsenbeck,et al.  Bayesian Estimation of Positively Selected Sites , 2004, Journal of Molecular Evolution.

[15]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[16]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[17]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[18]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[19]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[20]  James I Mullins,et al.  Potential impact of recombination on sitewise approaches for detecting positive natural selection. , 2003, Genetical research.

[21]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[22]  R. Nielsen,et al.  Pervasive adaptive evolution in mammalian fertilization proteins. , 2003, Molecular biology and evolution.

[23]  Martin C J Maiden,et al.  Phylogenetic evidence for frequent positive selection and recombination in the meningococcal surface antigen PorB. , 2002, Molecular biology and evolution.

[24]  Blake C Meyers,et al.  Patterns of positive selection in the complete NBS-LRR gene family of Arabidopsis thaliana. , 2002, Genome research.

[25]  M. Suchard,et al.  Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. , 2002, Systematic biology.

[26]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[27]  Christopher H Woelk,et al.  Phylogenetic evidence for adaptive evolution of dengue viruses in nature. , 2002, The Journal of general virology.

[28]  Joseph P Bielawski,et al.  Accuracy and power of bayes prediction of amino acid sites under positive selection. , 2002, Molecular biology and evolution.

[29]  P. Fearnhead,et al.  A coalescent-based method for detecting and estimating recombination from gene sequences. , 2002, Genetics.

[30]  Eric Martz,et al.  Protein Explorer: easy yet powerful macromolecular visualization. , 2002, Trends in biochemical sciences.

[31]  R. Nielsen,et al.  Detecting Positively Selected Amino Acid Sites Using Posterior Predictive P-Values , 2001, Pacific Symposium on Biocomputing.

[32]  E. J. Feil,et al.  Carried Meningococci in the Czech Republic: a Diverse Recombining Population , 2000, Journal of Clinical Microbiology.

[33]  Ziheng Yang,et al.  Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. , 2002, Molecular biology and evolution.

[34]  A. Eyre-Walker,et al.  The correlation between linkage disequilibrium and distance: implications for recombination in hominid mitochondria. , 2001, Molecular biology and evolution.

[35]  J. Wakeley,et al.  Gene genealogies in a metapopulation. , 2001, Genetics.

[36]  Z. Yang,et al.  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. , 2001, Molecular biology and evolution.

[37]  M. Ford,et al.  Molecular evolution of transferrin: evidence for positive selection in salmonids. , 2001, Molecular biology and evolution.

[38]  Valeria Souza,et al.  The Interaction of Protein Structure, Selection, and Recombination on the Evolution of the Type-1 Fimbrial Major Subunit (fimA) from Escherichia coli , 2001, Journal of Molecular Evolution.

[39]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[40]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[41]  J G Bishop,et al.  Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[43]  B. Barrell,et al.  Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491 , 2000, Nature.

[44]  Toshimichi Ikemura,et al.  Codon usage tabulated from international DNA sequence databases: status for the year 2000 , 2000, Nucleic Acids Res..

[45]  J. Derrick,et al.  Structural and Evolutionary Inference from Molecular Variation in Neisseria Porins , 1999, Infection and Immunity.

[46]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[47]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[48]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[49]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[50]  B. Spratt,et al.  Sequence evolution of the porB gene of Neisseria gonorrhoeae and Neisseria meningitidis: evidence of positive Darwinian selection. , 1995, Molecular biology and evolution.

[51]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[52]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[53]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[54]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[55]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[56]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[57]  G. Grimmett,et al.  Probability and random processes , 2002 .

[58]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.