chip artifact CORRECTion (caCORRECT): A Bioinformatics System for Quality Assurance of Genomics and Proteomics Array Data

Quality assurance of high throughput “-omics” data is a major concern for biomedical discovery and translational medicine, and is considered a top priority in bioinformatics and systems biology. Here, we report a web-based bioinformatics tool called caCORRECT for chip artifact detection, analysis, and CORRECTion, which removes systematic artifactual noises that are commonly observed in microarray gene expression data. Despite the development of major databases such as GEO arrayExpress, caArray, and the SMD to manage and distribute microarray data to the public, reproducibility has been questioned in many cases, including high-profile papers and datasets. Based on both archived and synthetic data, we have designed the caCORRECT to have several advanced features: (1) to uncover significant, correctable artifacts that affect reproducibility of experiments; (2) to improve the integrity and quality of public archives by removing artifacts; (3) to provide a universal quality score to aid users in their selection of suitable microarray data; and (4) to improve the true-positive rate of biomarker selection verified by test data. These features are expected to improve the reproducibility of Microarray study. caCORRECT is freely available at: http://caCORRECT.bme.gatech.edu.

[1]  Michael Shtutman,et al.  Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments , 2022 .

[2]  Andrew J. Holloway,et al.  Options available—from start to finish—for obtaining data from DNA microarrays II , 2002, Nature Genetics.

[3]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[4]  Brigitte Meunier,et al.  Transcriptional response to nitrosative stress in Saccharomyces cerevisiae , 2006, Yeast.

[5]  Terence P. Speed,et al.  Quality Assessment for Short Oligonucleotide Microarray Data , 2007, Technometrics.

[6]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[7]  Daniel J. Park,et al.  A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[8]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[9]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[10]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[11]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[12]  M. Gerstein,et al.  Relationship between gene co-expression and probe localization on microarray slides , 2003, BMC Genomics.

[13]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[14]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[15]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[16]  John N. Weinstein,et al.  Quality assessment of microarrays: Visualization of spatial artifacts and quantitation of regional biases , 2005, BMC Bioinformatics.

[17]  Xiaobo Zhou,et al.  Missing-value estimation using linear and non-linear regression with Bayesian gene selection , 2003, Bioinform..

[18]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[20]  Mayte Suárez-Fariñas,et al.  "Harshlighting" small blemishes on microarrays , 2005, BMC Bioinformatics.

[21]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[22]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[23]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[24]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[25]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[26]  Mayte Suárez-Fariñas,et al.  Harshlight: a "corrective make-up" program for microarray chips , 2005, BMC Bioinformatics.

[27]  M. Becich,et al.  Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Yudong D. He,et al.  Effects of atmospheric ozone on microarray data quality. , 2003, Analytical chemistry.