NCBI GEO: mining millions of expression profiles—database and tools

The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30 000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[3]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[4]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[5]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[6]  B. Oliver Fast males. , 2003, Heredity.

[7]  Chris Cheadle,et al.  Application of z-score transformation to Affymetrix data. , 2003, Applied bioinformatics.

[8]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[9]  C. Ball,et al.  Microarray Data Standards: An Open Letter , 2004, Environmental Health Perspectives.

[10]  F. Gómez-Merino,et al.  AtDGK2, a Novel Diacylglycerol Kinase from Arabidopsis thaliana, Phosphorylates 1-Stearoyl-2-arachidonoyl-sn-glycerol and 1,2-Dioleoyl-sn-glycerol and Exhibits Cold-inducible Gene Expression* , 2004, Journal of Biological Chemistry.

[11]  E. Tasheva,et al.  Analysis of the expression of chondroadherin in mouse ocular and non-ocular tissues. , 2004, Molecular vision.

[12]  Winnie S. Liang,et al.  Mapping of sudden infant death with dysgenesis of the testes syndrome (SIDDT) by a SNP genome scan and identification of TSPYL loss of function. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[14]  Brian P. Dalrymple,et al.  A rapid method for computationally inferring transcriptome coverage and microarray sensitivity , 2005, Bioinform..