The Branch-Site Test of Positive Selection Is Surprisingly Robust but Lacks Power under Synonymous Substitution Saturation and Variation in GC

Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio ω. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the ω ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to ∼80%) and low GC (∼30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.

[1]  Laurent Duret,et al.  Detecting positive selection within genomes: the problem of biased gene conversion , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Masatoshi Nei,et al.  Reliabilities of identifying positive selection by the branch-site and the site-prediction methods , 2009, Proceedings of the National Academy of Sciences.

[3]  Ziheng Yang,et al.  The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. , 2010, Molecular biology and evolution.

[4]  R. Nielsen,et al.  Looking for Darwin in genomic sequences--validity and success of statistical methods. , 2012, Molecular biology and evolution.

[5]  A. Hughes,et al.  Likelihood-ratio tests for positive selection of human and mouse duplicate genes reveal nonconservative and anomalous properties of widely used methods. , 2007, Molecular phylogenetics and evolution.

[6]  A. Hughes,et al.  Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level , 2007, Heredity.

[7]  R. Nielsen,et al.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. , 2005, Molecular biology and evolution.

[8]  Tal Pupko,et al.  Improving the performance of positive selection inference by filtering unreliable alignment regions. , 2012, Molecular biology and evolution.

[9]  A. Hughes,et al.  Evolution of adaptive phenotypic traits without positive Darwinian selection , 2011, Heredity.

[10]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[11]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[12]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[13]  Adi Doron-Faigenboim,et al.  Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection. , 2011, Molecular biology and evolution.

[14]  Laurent Duret,et al.  Biased gene conversion and the evolution of mammalian genomic landscapes. , 2009, Annual review of genomics and human genetics.

[15]  Katherine S. Pollard,et al.  The Role of GC-Biased Gene Conversion in Shaping the Fastest Evolving Regions of the Human Genome , 2011, Molecular biology and evolution.

[16]  Nick Goldman,et al.  The effects of alignment error and alignment filtering on the sitewise detection of positive selection. , 2012, Molecular biology and evolution.

[17]  W. Wong,et al.  Bayes empirical bayes inference of amino acid sites under positive selection. , 2005, Molecular biology and evolution.

[18]  Zih E N G Ya N,et al.  On the Best Evolutionary Rate for Phylogenetic Analysis , 1998 .

[19]  Austin L. Hughes,et al.  Codon-based tests of positive selection, branch lengths, and the evolution of mammalian immune system genes , 2008, Immunogenetics.

[20]  Joseph P Bielawski,et al.  Accuracy and power of bayes prediction of amino acid sites under positive selection. , 2002, Molecular biology and evolution.

[21]  Z. Yang,et al.  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. , 2001, Molecular biology and evolution.

[22]  L. Duret,et al.  GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. , 2001, Genetics.

[23]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[24]  L. Duret,et al.  Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. , 2008, Genome research.

[25]  Jonathan Romiguier,et al.  Efficient selection of branch-specific models of sequence evolution. , 2012, Molecular biology and evolution.

[26]  Sergei L. Kosakovsky Pond,et al.  A random effects branch-site model for detecting episodic diversifying selection. , 2011, Molecular biology and evolution.

[27]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[28]  D. Reich,et al.  The difficulty of avoiding false positives in genome scans for natural selection. , 2009, Genome research.

[29]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[30]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[31]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[32]  Maria Anisimova,et al.  Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. , 2007, Molecular biology and evolution.

[33]  F. Hildebrand,et al.  Evidence of Selection upon Genomic GC-Content in Bacteria , 2010, PLoS genetics.

[34]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[35]  R. Nielsen,et al.  Patterns of Positive Selection in Six Mammalian Genomes , 2008, PLoS genetics.

[36]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[37]  Kevin Vanneste,et al.  Inference of genome duplications from age distributions revisited. , 2013, Molecular biology and evolution.

[38]  Gaston H. Gonnet,et al.  Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment , 2009, Genome biology and evolution.

[39]  Ziheng Yang,et al.  Statistical properties of the branch-site test of positive selection. , 2011, Molecular biology and evolution.

[40]  Adrian Schneider,et al.  Codon Evolution: Mechanisms and Models , 2012 .

[41]  Guy Perrière,et al.  Databases of homologous gene families for comparative genomics , 2009, BMC Bioinformatics.

[42]  Maria Anisimova,et al.  Investigating protein-coding sequence evolution with probabilistic codon substitution models. , 2009, Molecular biology and evolution.

[43]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.