Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models

The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.

[1]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[2]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[3]  Claus Vogl,et al.  The allele-frequency spectrum in a decoupled Moran model with mutation, drift, and directional selection, assuming small mutation rates , 2012, Theoretical population biology.

[4]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[5]  H. Kishino,et al.  Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-coding Sequences 2009 Seo and Kishino—comparison of 3 Types of Evolutionary Models , 2022 .

[6]  Kevin R. Thornton,et al.  Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. , 2005, Genome research.

[7]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[8]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  L. Hurst,et al.  The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution. , 2001, Molecular biology and evolution.

[10]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[11]  Laurent Duret,et al.  Biased gene conversion and the evolution of mammalian genomic landscapes. , 2009, Annual review of genomics and human genetics.

[12]  Piyush Goel,et al.  Molecular Evolution in the Drosophila melanogaster Species Subgroup: Frequent Parameter Fluctuations on the Timescale of Molecular Divergence , 2006, Genetics.

[13]  Giorgio Bernardi,et al.  Inaccurate reconstruction of ancestral GC levels creates a "vanishing isochores" effect. , 2004, Molecular phylogenetics and evolution.

[14]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[15]  Jacek Majewski,et al.  Evidence for codon bias selection at the pre-mRNA level in eukaryotes. , 2004, Trends in genetics : TIG.

[16]  L. Duret Mutation Patterns in the Human Genome: More Variable Than Expected , 2009, PLoS biology.

[17]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[18]  M. Gouy,et al.  Inferring phylogenies from DNA sequences of unequal base compositions. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[19]  P. Arndt,et al.  Quantifying the stationarity and time reversibility of the nucleotide substitution process. , 2008, Molecular biology and evolution.

[20]  Alan Hodgkinson,et al.  Cryptic Variation in the Human Mutation Rate , 2009, PLoS biology.

[21]  Wen-Hsiung Li,et al.  Are GC-rich isochores vanishing in mammals? , 2006, Gene.

[22]  B. Charlesworth,et al.  The effects of deleterious mutations on evolution in non-recombining genomes. , 2009, Trends in genetics : TIG.

[23]  A. Clark,et al.  Neutral behavior of shared polymorphism. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[24]  K. J. Fryxell,et al.  CpG mutation rates in the human genome are highly dependent on local GC content. , 2005, Molecular biology and evolution.

[25]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[26]  P. Polak,et al.  Transcription induces strand-specific mutations at the 5' end of human genes. , 2008, Genome research.

[27]  L. Duret,et al.  Recombination drives the evolution of GC-content in the human genome. , 2004, Molecular biology and evolution.

[28]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[29]  G Bernardi,et al.  Human coding and noncoding DNA: compositional correlations. , 1996, Molecular phylogenetics and evolution.

[30]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[31]  T. Nagylaki Evolution of a finite population under gene conversion. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Marek Kimmel,et al.  simuPOP: a forward-time population genetics simulation environment , 2005, Bioinform..

[33]  P. Keightley,et al.  Joint Inference of the Distribution of Fitness Effects of Deleterious Mutations and Population Demography Based on Nucleotide Polymorphism Frequencies , 2007, Genetics.

[34]  Ryan D. Hernandez,et al.  Context-dependent mutation rates may cause spurious signatures of a fixation bias favoring higher GC-content in humans. , 2007, Molecular biology and evolution.

[35]  L. Hurst,et al.  Hearing silence: non-neutral evolution at synonymous sites in mammals , 2006, Nature Reviews Genetics.

[36]  Marcelo Serrano Zanetti,et al.  CodonPhyML: Fast Maximum Likelihood Phylogeny Estimation under Codon Substitution Models , 2013, Molecular biology and evolution.

[37]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[38]  Laurent Duret,et al.  The Decline of Isochores in Mammals: An Assessment of the GC ContentVariation Along the Mammalian Phylogeny , 2004, Journal of Molecular Evolution.

[39]  P. Polak,et al.  The evolution of transcription-associated biases of mutations across vertebrates , 2010, BMC Evolutionary Biology.

[40]  Kai Zeng,et al.  Estimating Selection Intensity on Synonymous Codon Usage in a Nonequilibrium Population , 2009, Genetics.

[41]  Ilan Gronau,et al.  Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. , 2011, Molecular biology and evolution.

[42]  Alan Hodgkinson,et al.  Variation in the mutation rate across mammalian genomes , 2011, Nature Reviews Genetics.

[43]  K. J. Fryxell,et al.  Cytosine deamination plays a primary role in the evolution of mammalian isochores. , 2000, Molecular biology and evolution.

[44]  David Bryant,et al.  Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. , 2009, Molecular biology and evolution.

[45]  L. Duret,et al.  GC-biased gene conversion promotes the fixation of deleterious amino acid changes in primates. , 2009, Trends in genetics : TIG.

[46]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[47]  M. Lynch Rate, molecular spectrum, and consequences of human mutation , 2010, Proceedings of the National Academy of Sciences.

[48]  B. Charlesworth,et al.  A Method for Inferring the Rate of Occurrence and Fitness Effects of Advantageous Mutations , 2011, Genetics.

[49]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[50]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[51]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[52]  Laurent Duret,et al.  The Impact of Recombination on Nucleotide Substitutions in the Human Genome , 2008, PLoS genetics.

[53]  P. Keightley,et al.  A Comparison of Models to Infer the Distribution of Fitness Effects of New Mutations , 2013, Genetics.

[54]  Laurent Duret,et al.  Detecting positive selection within genomes: the problem of biased gene conversion , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[56]  Ziheng Yang,et al.  Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. , 2008, Molecular biology and evolution.

[57]  B. Charlesworth,et al.  The detection of shared and ancestral polymorphisms. , 2005, Genetical research.

[58]  L. Duret,et al.  Vanishing GC-rich isochores in mammalian genomes. , 2002, Genetics.

[59]  Serge Massar,et al.  Optimality of the genetic code with respect to protein stability and amino-acid frequencies , 2001, Genome Biology.

[60]  L. Hurst,et al.  Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. , 2007, Molecular biology and evolution.

[61]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[62]  P. A. P. Moran,et al.  Random processes in genetics , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[63]  E. Thompson,et al.  A two-stage pruning algorithm for likelihood computation for a population tree. , 2008, Genetics.

[64]  Daniel J. Wilson,et al.  A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences , 2011, PLoS genetics.

[65]  Ryan D. Hernandez,et al.  A Fine-Scale Chimpanzee Genetic Map from Population Sequencing , 2012, Science.

[66]  A. Hobolth,et al.  Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach , 2009, Genetics.

[67]  Ian Holmes,et al.  Estimating Empirical Codon Hidden Markov Models , 2012, Molecular biology and evolution.

[68]  A. Hobolth,et al.  Estimating Divergence Time and Ancestral Effective Population Size of Bornean and Sumatran Orangutan Subspecies Using a Coalescent Hidden Markov Model , 2011, PLoS genetics.

[69]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[70]  Laurent Duret,et al.  A new perspective on isochore evolution. , 2006, Gene.