Ensembl 2018

Abstract The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.

Astrid Gall | Alessandro Vullo | Jane Loveland | Matthew R. Laird | Daniel R. Zerbino | Laurent Gil | Matthieu Muffato | Zhicheng Liu | Jonathan M. Mudge | Daniel Barrell | Andrew D. Yates | William M. McLaren | Magali Ruffier | Kieron R. Taylor | Helen Schuilenburg | Michael Nuhn | Paul Flicek | Daniel M. Staines | Brandon Walts | Thomas Juettemann | Sarah E. Hunt | Konstantinos Billis | Benjamin Moore | Emily Perry | Bronwen L. Aken | Carlos García-Girón | Thibaut Hourlier | Fergal J. Martin | Daniel N. Murphy | Amonida Zadissa | Carla A. Cummins | Stephen J. Trevanion | Adam Frankish | Daniel Sheppard | Mateus Patricio | Thomas Maurel | Victoria Newman | Chuang Kee Ong | Helen Sparrow | Nicholas Langridge | Premanand Achuthan | Anja Thormann | Anne Parker | Fiona Cunningham | Wasiu A. Akanni | M. Ridwan Amode | Sophie H. Janacek | Ilias Lavidas | Harpreet Singh Riat | Osagie G. Izuogu | Myrto Kostadima | Jyothish Bhai | Leanne Haggerty | Erin Haskell | Denye Ogeh | Denye N. Ogeh | Leo Gordon | Jimmy Kiang To | Fergal J. Martin | A. Frankish | D. Barrell | A. Zadissa | F. Cunningham | W. McLaren | P. Flicek | D. Zerbino | S. Hunt | Anne Parker | S. Trevanion | Konstantinos Billis | Magali Ruffier | H. Riat | A. Thormann | Thomas Juettemann | Daniel Sheppard | Ilias Lavidas | M. Nuhn | Emily Perry | M. Amode | C. García-Girón | Leo Gordon | Thibaut Hourlier | Thomas Maurel | Mateus Patricio | K. Taylor | Alessandro Vullo | Matthieu Muffato | J. Loveland | H. Schuilenburg | Brandon Walts | Zhicheng Liu | B. Moore | D. Staines | P. Achuthan | C. Ong | Erin Haskell | A. Gall | Nicholas Langridge | S. Hunt | Leanne Haggerty | Jyothish Bhai | V. Newman | Helen Sparrow | Myrto A. Kostadima | C. G. Girón | Denye Ogeh | Laurent Gil | Fiona Cunningham | J. To | A. Vullo | Jimmy To | M. Laird | Helen Schuilenburg

[1]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[2]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[3]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[4]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[5]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[6]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2015, Nature.

[7]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[8]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[9]  Thomas Lengauer,et al.  BLUEPRINT to decode the epigenetic signature written in blood , 2012, Nature Biotechnology.

[10]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[11]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[12]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[13]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[14]  Syed Haider,et al.  Ensembl BioMarts: a hub for data retrieval across taxonomic space , 2011, Database J. Biol. Databases Curation.

[15]  Alessandro Vullo,et al.  The Ensembl REST API: Ensembl Data for Any Language , 2014, Bioinform..

[16]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[17]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[18]  Alessandro Vullo,et al.  Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation , 2016, bioRxiv.

[19]  Laurent Gil,et al.  Ensembl variation resources , 2010, BMC Genomics.

[20]  Mark L. Blaxter,et al.  GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species , 2017, Database J. Biol. Databases Curation.

[21]  Daniel R. Zerbino,et al.  Ensembl regulation resources , 2016, Database J. Biol. Databases Curation.

[22]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[23]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[24]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[25]  Paul Flicek,et al.  Avianbase: a community resource for bird genomics , 2015, Genome Biology.

[26]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[27]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[28]  Melinda R. Dwinell,et al.  Three Ontologies to Define Phenotype Measurement Data , 2012, Front. Gene..

[29]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[30]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[31]  Guy Cochrane,et al.  European Nucleotide Archive in 2016 , 2016, Nucleic Acids Res..

[32]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[33]  Thomas M. Keane,et al.  The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes , 2015, Mammalian Genome.