Ensembl BioMarts: a hub for data retrieval across taxonomic space

For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/

[1]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[2]  Lennart Martens,et al.  PRIDE and "Database on Demand" as valuable tools for computational proteomics. , 2011, Methods in molecular biology.

[3]  Sue Povey,et al.  The HGNC Database in 2008: a resource for the human genome , 2007, Nucleic Acids Res..

[4]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[5]  Junjun Zhang,et al.  BioMart: a data federation framework for large collaborative projects , 2011, Database J. Biol. Databases Curation.

[6]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[7]  James A. Smith,et al.  Using caching and optimization techniques to improve performance of the Ensembl website , 2010, BMC Bioinformatics.

[8]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[9]  R D Gaspar Aparicio,et al.  DataBase on Demand , 2012 .

[10]  E. Birney,et al.  The Ensembl core software libraries. , 2004, Genome research.

[11]  Laurent Gil,et al.  Ensembl variation resources , 2010, BMC Genomics.

[12]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[13]  Henning Hermjakob,et al.  The Reactome BioMart , 2011, Database J. Biol. Databases Curation.

[14]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[15]  David Haussler,et al.  ENCODE whole-genome data in the UCSC genome browser (2011 update) , 2010, Nucleic Acids Res..

[16]  Gautier Koscielny,et al.  Ensembl Genomes: Extending Ensembl across the taxonomic space , 2009, Nucleic Acids Res..

[17]  James G. R. Gilbert,et al.  The vertebrate genome annotation (Vega) database , 2004, Nucleic Acids Res..

[18]  Michael DiCuccio,et al.  Public data archives for genomic structural variation , 2010, Nature Genetics.

[19]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[20]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[21]  P. Flicek,et al.  Consistent annotation of gene expression arrays , 2010, BMC Genomics.

[22]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[23]  F. C. Kafatos,et al.  SNP Genotyping Defines Complex Gene-Flow Boundaries Among African Malaria Vector Mosquitoes , 2010, Science.

[24]  Stacey B Gabriel,et al.  Genetic Variation: A Laboratory Manual , 2007 .