Gene-wide identification of episodic selection.

We present BUSTED, a new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate. BUSTED can be used either on an entire phylogeny (without requiring an a priori hypothesis regarding which branches are under positive selection) or on a pre-specified subset of foreground lineages (if a suitable a priori hypothesis is available). Selection is modeled as varying stochastically over branches and sites, and we propose a computationally inexpensive evidence metric for identifying sites subject to episodic positive selection on any foreground branches. We compare BUSTED with existing models on simulated and empirical data. An implementation is available on www.datamonkey.org/busted, with a widget allowing the interactive specification of foreground branches.

[1]  Maria Anisimova,et al.  Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. , 2007, Molecular biology and evolution.

[2]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[3]  Konrad Scheffler,et al.  Models of coding sequence evolution , 2008, Briefings Bioinform..

[4]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[5]  Sergei L. Kosakovsky Pond,et al.  CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences , 2010, PLoS Comput. Biol..

[6]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[7]  R. Nielsen,et al.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. , 2005, Molecular biology and evolution.

[8]  Ian Holmes,et al.  Estimating Empirical Codon Hidden Markov Models , 2012, Molecular biology and evolution.

[9]  Sergei L. Kosakovsky Pond,et al.  Detecting Individual Sites Subject to Episodic Diversifying Selection , 2012, PLoS genetics.

[10]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[11]  Ben Murrell,et al.  Modeling HIV-1 Drug Resistance as Episodic Directional Selection , 2012, PLoS Comput. Biol..

[12]  Maria Anisimova,et al.  Investigating protein-coding sequence evolution with probabilistic codon substitution models. , 2009, Molecular biology and evolution.

[13]  Sergei L. Kosakovsky Pond,et al.  FUBAR: a fast, unconstrained bayesian approximation for inferring selection. , 2013, Molecular biology and evolution.

[14]  Ziheng Yang,et al.  Statistical properties of the branch-site test of positive selection. , 2011, Molecular biology and evolution.

[15]  Stéphane Guindon,et al.  Modeling the site-specific variation of selection patterns along lineages. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[17]  Arnold Kuzniar,et al.  Selectome update: quality control and computational improvements to a database of positive selection , 2013, Nucleic Acids Res..

[18]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[19]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[20]  Sergei L. Kosakovsky Pond,et al.  A random effects branch-site model for detecting episodic diversifying selection. , 2011, Molecular biology and evolution.

[21]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[22]  W. Messier,et al.  Episodic adaptive evolution of primate lysozymes , 1997, Nature.

[23]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[24]  Stéphane Guindon,et al.  Performance of standard and stochastic branch-site models for detecting positive selection among coding sequences. , 2014, Molecular biology and evolution.

[25]  Konrad Scheffler,et al.  Robust inference of positive selection from recombining coding sequences , 2006, Bioinform..

[26]  Ben Murrell,et al.  Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution , 2011, PloS one.

[27]  Z. Yang,et al.  Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. , 1998, Molecular biology and evolution.