NCBI GEO: archive for functional genomics data sets—10 years on

A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

[1]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[2]  Vasant Honavar,et al.  Detection of gene orthology from gene co-expression and protein interaction networks , 2010, BMC Bioinformatics.

[3]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[4]  Joel Dudley,et al.  Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets , 2010, PLoS Comput. Biol..

[5]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[6]  Samuel Granjeaud,et al.  TranscriptomeBrowser: A Powerful and Flexible Toolbox to Explore Productively the Transcriptional Landscape of the Gene Expression Omnibus Database , 2008, PloS one.

[7]  W. V. van IJcken,et al.  Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction , 2010, PloS one.

[8]  BMC Bioinformatics , 2005 .

[9]  Mahesan Niranjan,et al.  Reducing the algorithmic variability in transcriptome-based inference , 2010, Bioinform..

[10]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[11]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[12]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[13]  Yao Yu,et al.  GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction , 2009, BMC Bioinformatics.

[14]  Francisco Tirado,et al.  MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures , 2010, Nucleic Acids Res..

[15]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[16]  Mika Gustafsson,et al.  Gene Expression Prediction by Soft Integration and the Elastic Net—Best Performance of the DREAM3 Gene Expression Challenge , 2010, PloS one.

[17]  Pilar Zamora,et al.  An 8-gene qRT-PCR-based gene expression score that has prognostic value in early breast cancer , 2010, BMC Cancer.

[18]  A. Butte,et al.  Predicting environmental chemical factors associated with disease-related gene expression data , 2010, BMC Medical Genomics.

[19]  Hideaki Sugawara,et al.  Archiving next generation sequencing data , 2009, Nucleic Acids Res..

[20]  Ian M. Fingerman,et al.  NCBI Epigenomics: a new public resource for exploring epigenomic data sets , 2010, Nucleic Acids Res..

[21]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[22]  Chun-Chi Liu,et al.  Bayesian approach to transforming public gene expression repositories into disease diagnosis databases , 2010, Proceedings of the National Academy of Sciences.

[23]  Monica L. Mo,et al.  Global reconstruction of the human metabolic network based on genomic and bibliomic data , 2007, Proceedings of the National Academy of Sciences.