Expression Atlas: gene and protein expression across multiple studies and organisms

Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.

[1]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[2]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[3]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[4]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[5]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.

[6]  H. Stunnenberg,et al.  BLUEPRINT: mapping human blood cell epigenomes , 2013, Haematologica.

[7]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[8]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[9]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[10]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[11]  Henning Hermjakob,et al.  The complex portal - an encyclopaedia of macromolecular complexes , 2014, Nucleic Acids Res..

[12]  Alexander Dobin,et al.  Mapping RNA‐seq Reads with STAR , 2015, Current protocols in bioinformatics.

[13]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[14]  Kimberly Van Auken,et al.  WormBase 2016: expanding to enable helminth genomic research , 2015, Nucleic Acids Res..

[15]  Paul Kersey,et al.  Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data. , 2016, Methods in molecular biology.

[16]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[17]  Dan M. Bolser,et al.  Ensembl Genomes 2016: more genomes, more complexity , 2015, Nucleic Acids Res..

[18]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[19]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[20]  K. Reinert,et al.  OpenMS: a flexible open-source software platform for mass spectrometry data analysis , 2016, Nature Methods.

[21]  G. Clowry,et al.  HDBR Expression: A Unique Resource for Global and Individual Gene Expression Studies during Early Human Brain Development , 2016, Front. Neuroanat..

[22]  Lincoln Stein,et al.  Gramene 2016: comparative plant genomics and pathway resources , 2015, Nucleic Acids Res..

[23]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[24]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[25]  P. Kersey,et al.  Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data. , 2017, Methods in molecular biology.

[26]  Guy Cochrane,et al.  European Nucleotide Archive in 2016 , 2016, Nucleic Acids Res..

[27]  Helen E. Parkinson,et al.  The human-induced pluripotent stem cell initiative—data resources for cellular genetics , 2016, Nucleic Acids Res..

[28]  Nuno A. Fonseca,et al.  The RNASeq-er API—a gateway to systematically updated analysis of public RNA-seq data , 2017, Bioinform..

[29]  Robert Petryszak,et al.  Plant Reactome: a resource for plant pathways and comparative analysis , 2016, Nucleic Acids Res..

[30]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[31]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[32]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..