28-way vertebrate alignment and conservation track in the UCSC Genome Browser.

This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.

[1]  M. Kimura,et al.  The rate of molecular evolution considered from the standpoint of population genetics. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[2]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[3]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[4]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[5]  M. Ugarte,et al.  Analysis of the phenylalanine hydroxylase gene in the Spanish population: mutation profile and association with intragenic polymorphic markers. , 1997, American journal of human genetics.

[6]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[7]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[8]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[9]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[10]  O. Madsen,et al.  Sequence gaps join mice and men: phylogenetic evidence from deletions in two proteins. , 2002, Molecular biology and evolution.

[11]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[13]  Lynne Prevost,et al.  PAHdb 2003: What a locus‐specific knowledgebase can do , 2003, Human mutation.

[14]  W. Miller,et al.  Distinguishing regulatory DNA from neutral sites. , 2003, Genome research.

[15]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[16]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[17]  L. Hagenfeldt,et al.  Two mutations within the coding sequence of the phenylalanine hydroxylase gene , 1990, Human Genetics.

[18]  P. Vyas,et al.  Differences in the chromatin structure and cis-element organization of the human and mouse GATA1 loci: implications for cis-element identification. , 2004, Blood.

[19]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[20]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[21]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[22]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[23]  David Haussler,et al.  Computational identification of evolutionarily conserved exons , 2004, RECOMB.

[24]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[25]  T. Ohta,et al.  On the rate of molecular evolution , 2005, Journal of Molecular Evolution.

[26]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[27]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[28]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[29]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[30]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[31]  L. Hagenfeldt,et al.  Polymorphic DNA haplotypes at the phenylalanine hydroxylase locus and their relation to phenotype in Swedish phenylketonuria families , 2005, Human Genetics.

[32]  Jean L. Chang,et al.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[34]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[35]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[36]  Jun Kawai,et al.  Evolutionary turnover of mammalian transcription start sites. , 2006, Genome research.

[37]  Sudhir Kumar,et al.  Evolutionary anatomies of positions and types of disease-associated and neutral amino acid mutations in the human genome , 2006, BMC Genomics.

[38]  Michael R. Brent,et al.  Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[39]  A. Prakash,et al.  Measuring the accuracy of genome-size multiple alignments , 2007, Genome Biology.

[40]  Francesca Chiaromonte,et al.  ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. , 2006, Genome research.

[41]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[42]  Sudhir Kumar,et al.  Multiple sequence alignment: in pursuit of homologous DNA positions. , 2007, Genome research.

[43]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.

[44]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[45]  Colin N. Dewey,et al.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. , 2007, Genome research.

[46]  Daniel J. Blankenberg,et al.  A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. , 2007, Genome research.

[47]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[48]  Webb Miller,et al.  Using genomic data to unravel the root of the placental mammal phylogeny. , 2007, Genome research.

[49]  Bronwen L. Aken,et al.  Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences , 2007, Nature.

[50]  W. Miller,et al.  PhenCode: connecting ENCODE data with mutations and phenotype , 2007, Human mutation.

[51]  Mathieu Blanchette,et al.  Exact and Heuristic Algorithms for the Indel Maximum Likelihood Problem , 2007, J. Comput. Biol..

[52]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[53]  W. Miller,et al.  Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. , 2007, Genome research.

[54]  M. Novacek,et al.  Cretaceous eutherians and Laurasian origin for placental mammals near the K/T boundary , 2007, Nature.

[55]  D. Haussler,et al.  Phylogenomic resources at the UCSC Genome Browser. , 2008, Methods in molecular biology.