A genetic algorithm approach to detecting lineage-specific variation in selection pressure.

The ratio of nonsynonymous (dN) to synonymous (dS) substitution rates, omega, provides a measure of selection at the protein level. Models have been developed that allow omega to vary among lineages. However, these models require the lineages in which differential selection has acted to be specified a priori. We propose a genetic algorithm approach to assign lineages in a phylogeny to a fixed number of different classes of omega, thus allowing variable selection pressure without a priori specification of particular lineages. This approach can identify models with a better fit than a single-ratio model, and with fits that are better than (in an information theoretic sense) a fully local model, in which all lineages are assumed to evolve under different values of omega, but with far fewer parameters. By averaging over models which explain the data reasonably well, we can assess the robustness of our conclusions to uncertainty in model estimation. Our approach can also be used to compare results from models in which branch classes are specified a priori with a wide range of credible models. We illustrate our methods on primate lysozyme sequences and compare them with previous methods applied to the same data sets.

[1]  W. Messier,et al.  Episodic adaptive evolution of primate lysozymes , 1997, Nature.

[2]  Jian Shen,et al.  Discrete Branch Length Representations for Genetic Algorithms in Phylogenetic Search , 2004, EvoWorkshops.

[3]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[4]  Z. Yang,et al.  Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. , 1998, Molecular biology and evolution.

[5]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[7]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[8]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[9]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[10]  Matthew J. Brauer,et al.  Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference. , 2002, Molecular biology and evolution.

[11]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[12]  Kazutaka Katoh,et al.  Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny , 2001, Journal of Molecular Evolution.

[13]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[14]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[17]  H. Akaike A new look at the statistical model identification , 1974 .

[18]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[19]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[20]  Yong-Hyuk Kim,et al.  Optimizing the Order of Taxon Addition in Phylogenetic Tree Construction Using Genetic Algorithm , 2003, GECCO.

[21]  L. Darrell Whitley,et al.  An overview of evolutionary algorithms: practical issues and common pitfalls , 2001, Inf. Softw. Technol..