A Generalized Mechanistic Codon Model

Models of codon evolution have attracted particular interest because of their unique capabilities to detect selection forces and their high fit when applied to sequence evolution. We described here a novel approach for modeling codon evolution, which is based on Kronecker product of matrices. The 61 × 61 codon substitution rate matrix is created using Kronecker product of three 4 × 4 nucleotide substitution matrices, the equilibrium frequency of codons, and the selection rate parameter. The entities of the nucleotide substitution matrices and selection rate are considered as parameters of the model, which are optimized by maximum likelihood. Our fully mechanistic model allows the instantaneous substitution matrix between codons to be fully estimated with only 19 parameters instead of 3,721, by using the biological interdependence existing between positions within codons. We illustrate the properties of our models using computer simulations and assessed its relevance by comparing the AICc measures of our model and other models of codon evolution on simulations and a large range of empirical data sets. We show that our model fits most biological data better compared with the current codon models. Furthermore, the parameters in our model can be interpreted in a similar way as the exchangeability rates found in empirical codon models.

[1]  N. Jeffery,et al.  A guided tour of large genome size in animals: what we know and where we are heading , 2011, Chromosome Research.

[2]  H. Ellegren,et al.  A low rate of simultaneous double-nucleotide mutations in primates. , 2003, Molecular biology and evolution.

[3]  Hervé Philippe,et al.  Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles , 2010, Proceedings of the National Academy of Sciences.

[4]  J. Drake Too Many Mutants with Multiple Mutations , 2007, Critical reviews in biochemistry and molecular biology.

[5]  Ruth Hershberg,et al.  Selection on codon bias. , 2008, Annual review of genetics.

[6]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[7]  Sergei L. Kosakovsky Pond,et al.  CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences , 2010, PLoS Comput. Biol..

[8]  A. V. Konstantinova,et al.  On the phylogenetic position of insects in the Pancrustacea clade , 2009, Molecular Biology.

[9]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[10]  Guy Baele,et al.  Context-dependent codon partition models provide significant increases in model fit in atpB and rbcL protein-coding genes , 2011, BMC Evolutionary Biology.

[11]  Joseph P Bielawski,et al.  Gene conversion and functional divergence in the beta-globin gene family. , 2004, Journal of molecular evolution.

[12]  V. Savolainen,et al.  C4 Photosynthesis Evolved in Grasses via Parallel Adaptive Genetic Changes , 2007, Current Biology.

[13]  Itay Mayrose,et al.  Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates , 2007, ISMB/ECCB.

[14]  Simon Whelan,et al.  Pandit: a database of protein and associated nucleotide domains with inferred trees , 2003, Bioinform..

[15]  N. Goldman,et al.  Different versions of the Dayhoff rate matrix. , 2005, Molecular biology and evolution.

[16]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[17]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[18]  N. Salamin,et al.  Effect of genetic convergence on phylogenetic inference. , 2012, Molecular phylogenetics and evolution.

[19]  Adi Doron-Faigenboim,et al.  Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection. , 2011, Molecular biology and evolution.

[20]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[21]  R. Nielsen,et al.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. , 2005, Molecular biology and evolution.

[22]  Maria Anisimova,et al.  Investigating protein-coding sequence evolution with probabilistic codon substitution models. , 2009, Molecular biology and evolution.

[23]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[24]  Samuel E. Fox,et al.  Discovery of Highly Divergent Repeat Landscapes in Snake Genomes Using High-Throughput Sequencing , 2011, Genome biology and evolution.

[25]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[26]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[27]  T. Pupko,et al.  A combined empirical and mechanistic codon model. , 2006, Molecular biology and evolution.

[28]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[29]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[30]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[31]  T. Faraut,et al.  High-resolution autosomal radiation hybrid maps of the pig genome and their contribution to the genome sequence assembly , 2012, BMC Genomics.

[32]  S. Whelan The genetic code can cause systematic bias in simple phylogenetic models , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[33]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[34]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[35]  Christoph Pacher,et al.  SlimCodeML: An Optimized Version of CodeML for the Branch-Site Model , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[36]  Alexei Fedorov,et al.  Regularities of context-dependent codon bias in eukaryotic genes. , 2002, Nucleic acids research.

[37]  D. Cooper,et al.  Meta‐analysis of indels causing human genetic disease: mechanisms of mutagenesis and the role of local DNA sequence complexity , 2003, Human mutation.

[38]  Paul D. Shaw,et al.  The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) marker analysis , 2010, BMC Evolutionary Biology.

[39]  Ian Holmes,et al.  XRate: a fast prototyping, training and annotation tool for phylo-grammars , 2006, BMC Bioinformatics.

[40]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[41]  H. Akaike A new look at the statistical model identification , 1974 .

[42]  Wendy S. W. Wong,et al.  Identification of physicochemical selective pressure on protein encoding nucleotide sequences , 2006, BMC Bioinformatics.

[43]  S. Aris-Brosou,et al.  Large-scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation. , 2006, Gene.

[44]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[45]  Daniele Silvestro,et al.  Evolutionary footprint of coevolving positions in genes , 2014, Bioinform..

[46]  Gaston H. Gonnet,et al.  Empirical codon substitution matrix , 2005, BMC Bioinformatics.

[47]  D. Liberles,et al.  The quest for natural selection in the age of comparative genomics , 2007, Heredity.

[48]  L. Duret,et al.  Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. , 2008, Genome research.

[49]  Tong Zhou,et al.  Detecting positive and purifying selection at synonymous sites in yeast and worm. , 2010, Molecular biology and evolution.

[50]  H. Bohnert,et al.  Life at the extreme: lessons from the genome , 2012, Genome Biology.

[51]  Joseph Felsenstein,et al.  Computational Molecular Evolution.Oxford Series in Ecology and Evolution.ByZiheng Yang. Oxford and New York: Oxford University Press. $115.00 (hardcover); $52.50 (paper). xvi + 357 p.; ill.; index. 0‐19‐856699‐9 (hc); 0‐19‐856702‐2 (pb). 2006. , 2008 .

[52]  V. Savolainen,et al.  Report Oligocene CO 2 Decline Promoted C 4 Photosynthesis in Grasses , 2008 .

[53]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[54]  A. Schneider,et al.  Empirical Analysis of the Most Relevant Parameters of Codon Substitution Models , 2010, Journal of Molecular Evolution.

[55]  B. Morton,et al.  Selective constraints on codon usage of nuclear genes from Arabidopsis thaliana. , 2006, Molecular biology and evolution.

[56]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[57]  M. Zvelebil,et al.  A model of directional selection applied to the evolution of drug resistance in HIV-1. , 2007, Molecular biology and evolution.

[58]  L. Stein,et al.  Species trees from highly incongruent gene trees in rice. , 2009, Systematic biology.

[59]  David T. Jones,et al.  Protein evolution with dependence among codons due to tertiary structure. , 2003, Molecular biology and evolution.

[60]  Sergei L. Kosakovsky Pond,et al.  Detecting Individual Sites Subject to Episodic Diversifying Selection , 2012, PLoS genetics.

[61]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[62]  Arnold Kuzniar,et al.  Selectome update: quality control and computational improvements to a database of positive selection , 2013, Nucleic Acids Res..

[63]  A. Schneider,et al.  A new semiempirical codon substitution model based on principal component analysis of mammalian sequences. , 2012, Molecular biology and evolution.

[64]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[65]  Nicolas Salamin,et al.  Towards building the tree of life: a simulation study for all angiosperm genera. , 2005, Systematic biology.

[66]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[67]  Hirohisa Kishino,et al.  Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins. , 2008, Systematic biology.

[68]  A. Y. Ye,et al.  Recent Adaptive Events in Human Brain Revealed by Meta-Analysis of Positively Selected Genes , 2013, PloS one.

[69]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[70]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[71]  Nick Goldman,et al.  A new criterion and method for amino acid classification. , 2004, Journal of theoretical biology.

[72]  Ian Holmes,et al.  Estimating Empirical Codon Hidden Markov Models , 2012, Molecular biology and evolution.

[73]  Simon Whelan,et al.  Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes , 2004, Genetics.

[74]  Nick Goldman,et al.  Variation in evolutionary processes at different codon positions. , 2006, Molecular biology and evolution.

[75]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[76]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[77]  Nicolas Lartillot,et al.  A site- and time-heterogeneous model of amino acid replacement. , 2008, Molecular biology and evolution.

[78]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..