Ensembl comparative genomics resources

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.

[1]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[2]  Paul Flicek,et al.  ncRNA orthologies in the vertebrate lineage , 2016, Database J. Biol. Databases Curation.

[3]  Alessandro Vullo,et al.  The Ensembl REST API: Ensembl Data for Any Language , 2014, Bioinform..

[4]  Inna Dubchak,et al.  GenomeVISTA - an integrated software package for whole-genome alignment and visualization , 2014, Bioinform..

[5]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[6]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome , 2013, Nucleic Acids Res..

[7]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[8]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[9]  Dan M. Bolser,et al.  Gramene 2013: comparative plant genomics resources , 2013, Nucleic Acids Res..

[10]  Alex Bateman,et al.  TreeFam v9: a new website, more species and orthology-on-the-fly , 2013, Nucleic Acids Res..

[11]  Dan M. Bolser,et al.  Ensembl Genomes 2013: scaling up access to genome-wide data , 2013, Nucleic Acids Res..

[12]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[13]  Alexander S. Garruss,et al.  Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution , 2013, Nature Genetics.

[14]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[15]  Matthieu Muffato,et al.  Genomicus: five genome browsers for comparative genomics in eukaryota , 2012, Nucleic Acids Res..

[16]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[17]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[18]  Christophe Dessimoz,et al.  Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs , 2012, PLoS Comput. Biol..

[19]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[20]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[21]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[22]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[23]  S. Searle,et al.  Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development , 2013, Genome Biology.

[24]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[25]  Predrag Radivojac,et al.  Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals , 2011, PLoS Comput. Biol..

[26]  Albert J. Vilella,et al.  Considerations for the inclusion of 2x mammalian genomes in phylogenetic analyses , 2011, Genome Biology.

[27]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[28]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[29]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[30]  Albert J. Vilella,et al.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis , 2010, PLoS biology.

[31]  Xiaoyu Chen,et al.  Comparative assessment of methods for aligning multiple genome sequences , 2010, Nature Biotechnology.

[32]  Albert J. Vilella,et al.  The genome of a songbird , 2010, Nature.

[33]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[34]  Paul Flicek,et al.  eHive: An Artificial Intelligence workflow system for genomic analysis , 2010, BMC Bioinformatics.

[35]  Eric Depiereux,et al.  2× genomes - depth does matter , 2010, Genome Biology.

[36]  Saurabh Sinha,et al.  Towards realistic benchmarks for multiple alignments of non-coding sequences , 2010, BMC Bioinform..

[37]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[38]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[39]  Inna Dubchak,et al.  Multiple whole-genome alignments without a reference organism. , 2009, Genome research.

[40]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[41]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[42]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[43]  E. Birney,et al.  Genome-wide nucleotide-level mammalian ancestor reconstruction. , 2008, Genome research.

[44]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[45]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[46]  Colin N. Dewey,et al.  Aligning multiple whole genomes with Mercator and MAVID. , 2007, Methods in molecular biology.

[47]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[48]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[49]  Nello Cristianini,et al.  CAFE: a computational tool for the study of gene family evolution , 2006, Bioinform..

[50]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[51]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[52]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[53]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[54]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[55]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[56]  E. Birney,et al.  The Ensembl core software libraries. , 2004, Genome research.

[57]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[58]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[59]  Jean-Michel Claverie,et al.  FusionDB: a database for in-depth analysis of prokaryotic gene fusion events , 2004, Nucleic Acids Res..

[60]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[62]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[63]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[64]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[65]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[66]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[67]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[68]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[69]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[70]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[71]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[72]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[73]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.