Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database

BackgroundOver the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases. Meta-analysis of substantial amounts of accumulated data, by integrating valuable information from multiple studies, is becoming more important in microarray research. However, collecting data of special interest from public microarray repositories often present major practical problems. Moreover, including low-quality data may significantly reduce meta-analysis efficiency.ResultsM2DB is a human curated microarray database designed for easy querying, based on clinical information and for interactive retrieval of either raw or uniformly pre-processed data, along with a set of quality-control metrics. The database contains more than 10,000 previously published Affymetrix GeneChip arrays, performed using human clinical specimens. M2DB allows online querying according to a flexible combination of five clinical annotations describing disease state and sampling location. These annotations were manually curated by controlled vocabularies, based on information obtained from GEO, ArrayExpress, and published papers. For array-based assessment control, the online query provides sets of QC metrics, generated using three available QC algorithms. Arrays with poor data quality can easily be excluded from the query interface. The query provides values from two algorithms for gene-based filtering, and raw data and three kinds of pre-processed data for downloading.ConclusionM2DB utilizes a user-friendly interface for QC parameters, sample clinical annotations, and data formats to help users obtain clinical metadata. This database provides a lower entry threshold and an integrated process of meta-analysis. We hope that this research will promote further evolution of microarray meta-analysis.

[1]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[2]  Ricardo Martínez,et al.  GenMiner: mining non-redundant association rules from integrated gene expression data and annotations , 2008, Bioinform..

[3]  Joaquín Dopazo,et al.  GEPAS: a web-based resource for microarray gene expression data analysis , 2003, Nucleic Acids Res..

[4]  Insuk Sohn,et al.  Statistical Challenges in Preprocessing in Microarray Experiments in Cancer , 2008, Clinical Cancer Research.

[5]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[6]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[7]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[8]  Sunil Singhal,et al.  A 10-Gene Classifier for Distinguishing Head and Neck Squamous Cell Carcinoma and Lung Squamous Cell Carcinoma , 2007, Clinical Cancer Research.

[9]  Wei Xu,et al.  EzArray: A web-based highly automated Affymetrix expression array data management and analysis system , 2007, BMC Bioinformatics.

[10]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[11]  Donald Geman,et al.  Large-scale integration of cancer microarray data identifies a robust common cancer signature , 2007, BMC Bioinformatics.

[12]  Jonathan M. Garibaldi,et al.  ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization , 2009, BMC Bioinformatics.

[13]  Jing Xu,et al.  A novel approach to detect differentially expressed genes from count-based digital databases by normalizing with housekeeping genes. , 2009, Genomics.

[14]  Aedín C. Culhane,et al.  Expression Profiler: next generation - an online platform for analysis of microarray data , 2004, Nucleic Acids Res..

[15]  Todd H. Stokes,et al.  ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses , 2008, BMC Bioinformatics.

[16]  John R. Stevens,et al.  Combining Affymetrix microarray results , 2005, BMC Bioinformatics.

[17]  Thomas Lengauer,et al.  Centralization: a new method for the normalization of gene expression data , 2001, ISMB.

[18]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[19]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[20]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[21]  A. Marchetti,et al.  Survival prediction of stage I lung adenocarcinomas by expression of 10 genes. , 2007, The Journal of clinical investigation.

[22]  Xuhua Xia,et al.  Using Generalized Procrustes Analysis (GPA) for normalization of cDNA microarray data , 2008, BMC Bioinformatics.

[23]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[24]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.

[25]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[26]  Tao Han,et al.  Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential , 2005, BMC Bioinformatics.

[27]  Christian J Stoeckert,et al.  Much room for improvement in deposition rates of expression microarray datasets , 2008, Nature Methods.

[28]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[29]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[30]  Philipp Bucher,et al.  CleanEx: new data extraction and merging tools based on MeSH term annotation , 2008, Nucleic Acids Res..

[31]  Patrick Cahan,et al.  Genomic profiling of acquired resistance to apoptosis in cells derived from human atherosclerotic lesions: potential role of STATs, cyclinD1, BAD, and Bcl-XL. , 2005, Journal of molecular and cellular cardiology.

[32]  B. De Moor,et al.  Comparison and meta-analysis of microarray data: from the bench to the computer desk. , 2003, Trends in genetics : TIG.

[33]  J. Astola,et al.  Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues , 2008, Genome Biology.

[34]  Jaakko Astola,et al.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations , 2009, BMC Bioinformatics.

[35]  Yipeng Wang,et al.  WebArrayDB: cross-platform microarray data analysis and public data repository , 2009, Bioinform..

[36]  Tzu-Hao Wang,et al.  Microarray labeling extension values: laboratory signatures for Affymetrix GeneChips , 2009, Nucleic acids research.

[37]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[38]  Gary A. Churchill,et al.  Randomization in Laboratory Procedure Is Key to Obtaining Reproducible Microarray Results , 2008, PloS one.

[39]  Yidong Chen,et al.  GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus , 2008, Bioinform..

[40]  Ola Larsson,et al.  Lack of correct data format and comparability limits future integrative microarray research , 2006, Nature Biotechnology.

[41]  Adam L. Asare,et al.  Power enhancement via multivariate outlier testing with gene expression arrays , 2009, Bioinform..

[42]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[43]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[44]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[45]  Tao Han,et al.  Improvement in the Reproducibility and Accuracy of DNA Microarray Quantification by Optimizing Hybridization Conditions , 2006, BMC Bioinformatics.

[46]  Lennart Martens,et al.  The Ontology Lookup Service: more data and better tools for controlled vocabulary queries , 2008, Nucleic Acids Res..

[47]  Patrick Cahan,et al.  Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. , 2007, Gene.

[48]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[49]  Zlatko Trajanoski,et al.  CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis , 2006, Nucleic Acids Res..

[50]  Ralf Herwig,et al.  Meta-Analysis Approach identifies Candidate Genes and associated Molecular Networks for Type-2 Diabetes Mellitus , 2008, BMC Genomics.

[51]  Crispin J. Miller,et al.  Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis , 2005, Bioinform..

[52]  Zhifu Sun,et al.  A Gene Expression Signature Predicts Survival of Patients with Stage I Non-Small Cell Lung Cancer , 2006, PLoS medicine.

[53]  Dale L. Wilson,et al.  New Normalization Methods for CDNA Microarray Data , 2003, Bioinform..

[54]  Steven J. M. Jones,et al.  Meta-analysis and meta-review of thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[55]  BMC Bioinformatics , 2005 .

[56]  Peter A. C. 't Hoen,et al.  Microarray retriever: a web-based tool for searching and large scale retrieval of public microarray data , 2008, Nucleic Acids Res..

[57]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Ruben H. Zamar,et al.  MDQC: a new quality assessment method for microarrays based on quality control reports , 2007, Bioinform..

[59]  Gianluca Bontempi,et al.  Biological Processes Associated with Breast Cancer Clinical Outcome Depend on the Molecular Subtypes , 2008, Clinical Cancer Research.

[60]  S. Nelson,et al.  Celsius: a community resource for Affymetrix microarray data , 2007, Genome Biology.