Modelling the evolution of protein coding sequences sampled from Measurably Evolving Populations.

Models of nucleotide or amino acid sequence evolution that implement homogeneous and stationary Markov processes of substitutions are mathematically convenient but are unlikely to represent the true complexity of evolution. With the large amounts of data that next generation sequencing promises, appropriate models of evolution are important, particularly when data are collected from ancient and sub-fossil remains, where changes in evolutionary parameters are the norm and not the exception. In this paper, we describe a new codon-based model of evolution that applies to Measurably Evolving Populations (MEPs). A MEP is defined as a population from which it is possible to detect a statistically significant accumulation of substitutions when sequences are obtained at different times. The new model of codon evolution permits changes to the substitution process, including changes to the intensity of selection and the proportions of sites undergoing different selective pressures. In our serial model of codon evolution, changes in the selective regime occur simultaneously across all lineages. Different regions of the protein may also evolve under distinct selective patterns. We illustrate the application of the new model to a dataset of HIV-1 sequences obtained from an infected individual before and after the commencement of antiretroviral therapy.

[1]  J. Margolick,et al.  Consistent Viral Evolutionary Changes Associated with the Progression of Human Immunodeficiency Virus Type 1 Infection , 1999, Journal of Virology.

[2]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[3]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[4]  A. Rodrigo,et al.  Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. , 2000, Molecular biology and evolution.

[5]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[6]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[7]  Joseph P. Bielawski,et al.  Maximum likelihood methods for detecting adaptive evolution after gene duplication , 2004, Journal of Structural and Functional Genomics.

[8]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[9]  J. Hartigan,et al.  Statistical Analysis of Hominoid Molecular Evolution , 1987 .

[10]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[11]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[12]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[13]  Stéphane Guindon,et al.  Modeling the site-specific variation of selection patterns along lineages. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yang Wang,et al.  Substitution Model of Sequence Evolution for the Human Immunodeficiency Virus Type 1 Subtype B gp120 Gene over the C2-V5 Region , 2001, Journal of Molecular Evolution.

[15]  Ziheng Yang,et al.  Maximum likelihood methods for detecting adaptive evolution after gene duplication. Gene and Genome Duplications and the Origin of Novel Gene Functions , 2003 .

[16]  Samuel Kotz,et al.  Some bivariate beta distributions , 2005 .

[17]  A. Rodrigo,et al.  Coalescent-Based Estimation of Population Parameters When the Number of Demes Changes over Time , 2006 .

[18]  E. G. Shpaer,et al.  Coalescent estimates of HIV-1 generation time in vivo. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Joseph Heled,et al.  The perils of plenty: what are we going to do with all these genes? , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[20]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[22]  Edward C. Holmes,et al.  Rates of Molecular Evolution in RNA Viruses: A Quantitative Phylogenetic Analysis , 2002, Journal of Molecular Evolution.

[23]  A. Rodrigo,et al.  The inference of stepwise changes in substitution rates using serial sequence samples. , 2001, Molecular biology and evolution.

[24]  M. Steel,et al.  Modeling the covarion hypothesis of nucleotide substitution. , 1998, Mathematical biosciences.

[25]  O. Pybus Model Selection and the Molecular Clock , 2006, PLoS biology.

[26]  Andrew Rambaut,et al.  Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies , 2000, Bioinform..

[27]  A. Zharkikh Estimation of evolutionary distances between nucleotide sequences , 1994, Journal of Molecular Evolution.

[28]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[29]  Geoff Nicholls,et al.  Using Temporally Spaced Sequences to Simultaneously Estimate Migration Rates, Mutation Rate and Population Sizes in Measurably Evolving Populations , 2004, Genetics.

[30]  Yun-Xin Fu,et al.  Test of Genetical Isochronism for Longitudinal Samples of DNA Sequences , 2007, Genetics.

[31]  Allen G. Rodrigo,et al.  Immune-Mediated Positive Selection Drives Human Immunodeficiency Virus Type 1 Molecular Variation and Predicts Disease Duration , 2002, Journal of Virology.

[32]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[33]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[34]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[35]  Alexander F. Auch,et al.  Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA , 2006, Science.

[36]  Influence of CD4+ T cell counts on viral evolution in HIV-infected individuals undergoing suppressive HAART. , 2004, Virology.