GeneSigDB—a curated database of gene expression signatures

The primary objective of most gene expression studies is the identification of one or more gene signatures; lists of genes whose transcriptional levels are uniquely associated with a specific biological phenotype. Whilst thousands of experimentally derived gene signatures are published, their potential value to the community is limited by their computational inaccessibility. Gene signatures are embedded in published article figures, tables or in supplementary materials, and are frequently presented using non-standard gene or probeset nomenclature. We present GeneSigDB (http://compbio.dfci.harvard.edu/genesigdb) a manually curated database of gene expression signatures. GeneSigDB release 1.0 focuses on cancer and stem cells gene signatures and was constructed from more than 850 publications from which we manually transcribed 575 gene signatures. Most gene signatures (n = 560) were successfully mapped to the genome to extract standardized lists of EnsEMBL gene identifiers. GeneSigDB provides the original gene signature, the standardized gene list and a fully traceable gene mapping history for each gene from the original transcribed data table through to the standardized list of genes. The GeneSigDB web portal is easy to search, allows users to compare their own gene list to those in the database, and download gene signatures in most common gene identifier formats.

[1]  J. Ross,et al.  Multigene Classifiers, Prognostic Factors, and Predictors of Breast Cancer Clinical Outcome , 2009, Advances in anatomic pathology.

[2]  Pieter B. T. Neerincx,et al.  Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis , 2009 .

[3]  Zhen Jiang,et al.  Gene set enrichment analysis using linear models and diagnostics , 2008, Bioinform..

[4]  L. V. van't Veer,et al.  Clinical application of the 70-gene profile: the MINDACT trial. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[6]  Doron Lancet,et al.  Novel definition files for human GeneChips based on GeneAnnot , 2007, BMC Bioinformatics.

[7]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[8]  L. Staudt,et al.  A library of gene expression signatures to illuminate normal and pathological lymphoid biology , 2006, Immunological reviews.

[9]  F. Bertucci,et al.  Lobular and ductal carcinomas of the breast have distinct genomic and expression profiles , 2008, Oncogene.

[10]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Liliana Florea,et al.  List of lists-annotated (LOLA): a database for annotation and comparison of published microarray gene lists. , 2005, Gene.

[12]  S. Paik,et al.  Development of the 21-gene assay and its application in clinical practice and clinical trials. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[13]  Jun Lu,et al.  Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays , 2007, BMC Bioinform..

[14]  Chris Sander,et al.  CancerGenes: a gene selection resource for cancer genome projects , 2006, Nucleic Acids Res..

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..