ArrayExpress update—trends in database growth and links to data analysis tools

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[3]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[4]  Faisal Ibne Rezwan,et al.  MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB , 2009, Bioinform..

[5]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[6]  Ravi Shankar,et al.  Annotare—a tool for annotating high-throughput biomedical investigations and resulting data , 2010, Bioinform..

[7]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[8]  Rodrigo Lopez,et al.  Petabyte-scale innovations at the European Nucleotide Archive , 2008, Nucleic Acids Res..

[9]  Takashi Gojobori,et al.  The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments , 2011, Nucleic Acids Res..

[10]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[11]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[12]  Audrey Kauffmann,et al.  Importing ArrayExpress datasets into R/Bioconductor , 2009, Bioinform..

[13]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[14]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[15]  Alvis Brazma,et al.  A pipeline for RNA-seq data processing and quality assessment , 2011, Bioinform..

[16]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[17]  Aedín C. Culhane,et al.  Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments , 2011, Nucleic Acids Res..

[18]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[19]  Marco Brandizi,et al.  The BioSample Database (BioSD) at the European Bioinformatics Institute , 2011, Nucleic Acids Res..

[20]  Ibrahim Emam,et al.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments , 2010, Nucleic Acids Res..