Inference of natural selection from interspersed genomic elements based on polymorphism and divergence.

We present a new probabilistic method for measuring the influence of natural selection on a collection of short elements scattered across a genome based on observed patterns of polymorphism and divergence. This is a challenging task for various reasons, including variation across loci in mutation rates and genealogical backgrounds, and the influence of demography on patterns of polymorphism. In addition, accounting for the combined effects of different modes of selection is known to be a serious challenge for tests of selection that use patterns of polymorphism and divergence. Our method addresses these challenges by contrasting patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites. While this general approach is common to several existing tests of selection, our method improves substantially on these methods by making use of a full generative probabilistic model, directly accommodating weak negative selection, allowing information from many short elements to be combined in a statistically rigorous manner, and integrating phylogenetic information from multiple outgroup species with genome-wide population genetic data. Our model is able to account for of weak negative, strong negative, and strong positive selection, by making a small set of simple assumptions on their separate effects on polymorphism and divergence. We implemented an expectation maximization algorithm for inference under this model and applied it to simulated and real data. Using simulations, we show that our inference procedure effectively disentangles the different modes of selection and provides accurate estimates of the parameters of interest that are robust to demography. We demonstrate an application of our methods to real data by analyzing several collections of human transcription factor binding sites identified using recently generated genome-wide chromatin immunoprecipitation and sequencing data.

[1]  Kevin R. Thornton,et al.  The Drosophila melanogaster Genetic Reference Panel , 2012, Nature.

[2]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[3]  Adam Eyre-Walker,et al.  Estimation of the neutrality index. , 2011, Molecular biology and evolution.

[4]  Adam C. Siepel,et al.  PHAST and RPHAST: phylogenetic analysis with space/time models , 2011, Briefings Bioinform..

[5]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[6]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[7]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[8]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[9]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[10]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[11]  P. Keightley,et al.  Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. , 2009, Molecular biology and evolution.

[12]  Ryan D. Hernandez,et al.  A flexible forward simulator for populations subject to selection and demography , 2008, Bioinform..

[13]  A. Visel,et al.  Response to Comment on "Human-Specific Gain of Function in a Developmental Enhancer" , 2009, Science.

[14]  Jane Charlesworth,et al.  The McDonald-Kreitman test and slightly deleterious mutations. , 2008, Molecular biology and evolution.

[15]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[16]  David Haussler,et al.  Targeted discovery of novel human exons by comparative genomics. , 2007, Genome research.

[17]  D. Haussler,et al.  An RNA gene expressed during cortical development evolved rapidly in humans , 2006, Nature.

[18]  David A. Nix,et al.  Large-Scale Turnover of Functional Transcription Factor Binding Sites in Drosophila , 2006, PLoS Comput. Biol..

[19]  A. Eyre-Walker,et al.  The Distribution of Fitness Effects of New Deleterious Amino Acid Mutations in Humans , 2006, Genetics.

[20]  Chris P. Ponting,et al.  Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model , 2005, PLoS Comput. Biol..

[21]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[22]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[23]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[24]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A. Eyre-Walker,et al.  The genomic rate of adaptive amino acid substitution in Drosophila. , 2004, Molecular biology and evolution.

[26]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[27]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[28]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[29]  M. Brent,et al.  Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  D Haussler,et al.  The share of human genomic DNA under selection estimated from human-mouse genomic alignments. , 2003, Cold Spring Harbor symposia on quantitative biology.

[31]  A. Clark,et al.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. , 2002, Molecular biology and evolution.

[32]  Carlos D. Bustamante,et al.  The cost of inbreeding in Arabidopsis , 2002, Nature.

[33]  Adam Eyre-Walker,et al.  Adaptive protein evolution in Drosophila , 2002, Nature.

[34]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[35]  J. Crow,et al.  A molecular approach to estimating the human deleterious mutation rate , 1993, Human mutation.

[36]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[37]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[38]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[39]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[40]  H. Chernoff On the Distribution of the Likelihood Ratio , 1954 .

[41]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .