Analysis of Sequence Conservation at Nucleotide Resolution

One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.

[1]  Sudhir Kumar,et al.  Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. , 2003, Genome research.

[2]  Shamil Sunyaev,et al.  Evolutionary constraints in conserved nongenic sequences of mammals. , 2005, Genome research.

[3]  D. Kleinjan,et al.  Long-range control of gene expression: emerging mechanisms and disruption in disease. , 2005, American journal of human genetics.

[4]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[5]  A. Clark,et al.  Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. , 2003, Molecular biology and evolution.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Daniel J. Gaffney,et al.  The scale of mutational variation in the murid genome. , 2005, Genome research.

[9]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[10]  Shamil Sunyaev,et al.  A limited role for balancing selection. , 2005, Trends in genetics : TIG.

[11]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[12]  Aleksey Y Ogurtsov,et al.  Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. , 2006, Journal of theoretical biology.

[13]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[14]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[15]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[16]  Xun Gu,et al.  Predicting functional divergence in protein evolution by site-specific rate shifts. , 2002, Trends in biochemical sciences.

[17]  M. Nóbrega,et al.  Comparative genomics at the vertebrate extremes , 2004, Nature Reviews Genetics.

[18]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[19]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[20]  A. Reymond,et al.  Conserved non-genic sequences — an unexpected feature of mammalian genomes , 2005, Nature Reviews Genetics.

[21]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[22]  William Stafford Noble,et al.  Widely distributed noncoding purifying selection in the human genome , 2007, Proceedings of the National Academy of Sciences.

[23]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[24]  Laurent Excoffier,et al.  Conserved noncoding sequences are selectively constrained and not mutation cold spots , 2006, Nature Genetics.

[25]  L. Hurst,et al.  Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals , 2005, Genome Biology.

[26]  Terence Hwa,et al.  Distinct changes of genomic biases in nucleotide substitution at the time of Mammalian radiation. , 2003, Molecular biology and evolution.

[27]  Wen-Hsiung Li,et al.  Mutation rates differ among regions of the mammalian genome , 1989, Nature.

[28]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[29]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[30]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[31]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[32]  Adam Eyre-Walker,et al.  Molecular Evolution by Wen-Hsiung Li. Published by Sinauer Associates, Sunderland, MA, USA. ISBN: 0-87893-463-4 (cloth). , 1997 .

[33]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[34]  W. Miller,et al.  Mulan: multiple-sequence local alignment and visualization for studying function and evolution. , 2005, Genome research.

[35]  E. Koonin,et al.  A universal trend of amino acid gain and loss in protein evolution , 2005, Nature.

[36]  C. Ponting,et al.  Evolution of domain families. , 2000, Advances in protein chemistry.

[37]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[38]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[39]  Shamil Sunyaev,et al.  Small fitness effect of mutations in highly conserved non-coding regions. , 2005, Human molecular genetics.

[40]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[41]  T. Massingham,et al.  Detecting Amino Acid Sites Under Positive Selection and Purifying Selection , 2005, Genetics.

[42]  E. Schadt,et al.  Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. , 2005, Trends in genetics : TIG.

[43]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.